• Open

    [P] How to do multivariate time series classification using C# and either the Accord.NET or Encog libraries?
    I have a time series based on financial security prices with additional features. I wish to feed this series into some ML construct in order to perform multi-class classification. Most of the solutions that I found in my search offer predictions. I am not interested in predicting future prices. I merely wish to train the ML construct to offer the most likely class for the given time-series input frame. I am looking for C# solutions or links to tutorials that use either the Accord.NET, Encog or ML.NET libraries. I would be most appreciative for answers that lead my eyes to view C# code that demonstrates a solution to my question. In lieu of the above, I would also appreciate a description of the types of ML constructs that would satisfy my requirements. I have no interest in Python solutions. Please, do not chastise me or praise Python. I need the code to be in C# so that it easily integrates into existing code. Thank you. Edit: I forgot to include ML.NET. submitted by /u/LeftShoeHighway [link] [comments]  ( 1 min )
    [P] I need help finding an AI that tells you what sports career is best for you.
    You ask a set of questions and it will use your answers to tell you what sports career is best for you. I have been having a hard time finding it online. Can anyone lend a hand? It would be greatly appreciated. submitted by /u/Texidork [link] [comments]  ( 1 min )
    [D] AI stocks
    The advances in AI the last two years have been mindblowing, I have taken som parttime MLclasses just to try to get a grasp. And wow, im impressed of what someone like me can do with low coding skills but high willingnes to learn. I have already built a recomendation model to improve my policy paragraphs based on input text from relevant research articles. I have to admitt that gramerly and quillbot beats my hobby project, but it was a fun run and I have gotten a ton of experience. One of them is that AI is clearly the future, and I want to place some of my investments in AI as a sector. Buy and hold for the future. The stockmarked is plumeting and will probably continue to do so for a while, but I want to start to research my options. Can you guys share your knowledge of tradable businesses, either pure AI conpanies or parentcompanies with controll? Im all for responsible trading, but feel free to share uncertain yolo companies as well. submitted by /u/sikkerhetellersafety [link] [comments]  ( 1 min )
    [P] I made an open-source demo of OpenAI's CLIP model running completely in the browser - no server involved. Compute embeddings for (and search within) a local directory of images, or search 200k popular images from Reddit (as shown in this video). Link to demo and Github repo in comments.
    submitted by /u/joerocca [link] [comments]  ( 2 min )
    Which Alg to use? [R]
    Hey, So I am taking a few images and wanting to use them to train a model to predict how much a test image matches up to the training images. Would using a CNN be my best bet or using haarcascade classifiers? Any other thoughts? I am doing this in Python on google Collab. Thanks! submitted by /u/Cloverdover1 [link] [comments]  ( 1 min )
    [Project] Volunteers Needed for Ukraine Project
    We are recruiting volunteers for a project that will help Ukraine. This is a data-oriented project, and we can use all the help we can get. We want to work very intensely on this project so we can release it quickly. To join us and help Ukraine, please reach out to [breaker25789@gmail.com](mailto:breaker25789@gmail.com) with your name, email, and the team you are interested in. Data Team · No prior skills necessary. New volunteers will receive training in identifying soldiers and military equipment upon joining our team. · This role takes a minimum of five (5) hours a week. · Minimum Age: 18+ · CONTENT WARNING: The primary role of a member of the Data Team is to directly interact with photos and videos from the war in Ukraine, which often contain graphic images of violence and death. Machine Learning Team · Each volunteer needs to be able to dedicate a minimum of ten (10) hours a week. · Preferred prior experience includes familiarity with Docker, AWS SageMaker and S3, machine learning attacks, machine learning security, dedicated red team work, and/or data science. · Minimum Age: 18+ · CONTENT WARNING: Individuals directly involved in training certain algorithms will be exposed to photos and videos from the war in Ukraine, which often contain graphic violence and death. Please notify us if you would prefer to not see that content. submitted by /u/OttersAreDevilSpawn [link] [comments]  ( 2 min )
    [D] Research Director at Deepmind says all we need now is scaling
    submitted by /u/SnoozeDoggyDog [link] [comments]  ( 5 min )
    [P] Image Fusion Techniques for Image classification Task
    Can anyone recommend sources on image fusion techniques (preferably for RGB and near-infrared images) for image classification tasks. submitted by /u/Antman-007 [link] [comments]
    [D] Taking derivative of Expectation with respect to Phi (Variational Inference)
    The snippet below from page 20 of the paper here mentions that derivative cannot be taken inside the expectation as expectation is a function of phi. ​ https://preview.redd.it/jo4e2nt0lgz81.png?width=1148&format=png&auto=webp&s=1f3acb968a07aa3858657770c12689031105a03d However, the paper here (Page 3) from the same author shows score function estimator being used to estimate gradient that takes the derivative inside expectation even when expectation is a function of phi. Highlighted in the snippet below: ​ https://preview.redd.it/fsv8s272lgz81.png?width=940&format=png&auto=webp&s=c5ad403563d1f9c8048f95a413eb188c35e785c3 I am not being able to understand the discrepancy. Could anyone please help me get insight on this? I feel that I am missing something. submitted by /u/That-Mud3051 [link] [comments]  ( 1 min )
    [D] Best resources to keep up with latest machine learning research
    I am an 'applied' machine learning researcher, i.e. about 80% of my time is on machine learning, 20% is applying it to physics problems. The increasing breadth and depth of new machine learning research is awe-inspiring. I would like to be able to keep up with the newest developments in the field, somehow, without obviously having the time to read all the latest developments. Is there a website, a resource (like weekly or monthly magazines), or community aimed at collating the newest insights and directions and publishing summaries/overviews in digestible formats? submitted by /u/intheprocesswerust [link] [comments]  ( 2 min )
    [R] Predicting the Elections with Deep Learning - Part 1 - Results
    Hi everyone 👋, During the last 18 months, I played — maybe too much — with census & elections data and tried to predict the french elections with deep learning. I'm not officially a ML researcher, that's only a side project, so be nice please 🙏 ! I'm sharing this only to have fun discussions with people who have the same kink as me. 😊 Here it is 👉 Predicting the Elections with Deep Learning - Part 1 - Results You don't have to be a data scientist to read this 1st post which talks only about the results of the experiment. TLDR of this 1st post: No I didn't find a way to predict the elections results But It's surprising how much you can learn about voters behavior (who votes for which party), even using only aggregated public data. It's only qualitative results so it's not real hard science. Still, it's interesting (worrying?) to think about what would be possible with more data (e.g: FAANG) After that, I'll write 2 more posts: 1 for the model implementation and 1 for the MLOps tooling I've used. submitted by /u/qchenevier [link] [comments]  ( 3 min )
    [P] Java implementation of GPT2 tokenizer
    GPT2 Tokenizer Java When developing a service using the GPT3 API, we often need to count the number of tokens. However, if you develop a service in Java, it is not easy to count this. GPT3 is known to use the same tokenizer as GPT2, so this should be a huge help for someone. For more detail and code, refer to https://github.com/hyunwoongko/gpt2-tokenizer-java. Requirements Please install the following dependencies to use the library. ``` implementation 'com.google.api-client:google-api-client:1.32.2' implementation 'org.apache.commons:commons-lang3:3.12.0' implementation 'org.springframework.boot:spring-boot-starter-web' testImplementation 'org.junit.jupiter:junit-jupiter-api:5.3.1' testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.3.1' ``` Add tokenizer files to resources directory Please add encoder.json and vocab.bpe files to your project resources directory. these files can be found here. Usage The following are simple examples of this library. To check test code for this, refer to here. Encoding text to tokens ```java import ai.tunib.tokenizer.GPT2Tokenizer; import java.util.List; GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES"); List result = tokenizer.encode("Hello my name is Kevin."); [15496, 616, 1438, 318, 7939, 13] ``` Decoding tokens to text ```java import ai.tunib.tokenizer.GPT2Tokenizer; GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES"); String result = tokenizer.decode(List.of(15496, 616, 1438, 318, 7939, 13)); "Hello my name is Kevin." ``` License This project is licensed under the terms of the Apache License 2.0. Copyright 2022 Hyunwoong Ko. All Rights Reserved. submitted by /u/hyunwoongko [link] [comments]  ( 1 min )
    [D] [R] AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling
    https://arxiv.org/abs/2205.05862 It looks promising, what do you think? submitted by /u/carl__11 [link] [comments]
    [D] LayoutLM: Pre-training of Text and Layout for Document Image Understanding (Paper Summary)
    Pre-training of models for NLP applications, exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. This paper proposes LayoutLM that jointly model interactions between text and layout information across scanned document images. Fits very well in use-cases like Resume parsing, Bills parsing, Table parsing, etc. Per Summary: https://youtu.be/ewyDVIdKXm0 Paper Link: https://arxiv.org/abs/1912.13318 submitted by /u/prakhar21 [link] [comments]  ( 1 min )
    [D]Geometric perspective of meta-learning
    I am curious if there is some work explaining the efficiency of meta-learning from a geometric perspective. For example, the famous meta-learning algorithm MAML aims to find parameters of a neural network model that can quickly adapt to new tasks, intuitively it is very similar to finding a point in the parameter space that has short distances between all the optimal parameters of training tasks, i.e a barycenter under some distance metric, and then the adaptation step may be considered as a transport path. This is just a simple (maybe naive) example, would be great if you can share some related work. Thank you! submitted by /u/mike_shen [link] [comments]  ( 1 min )
    [D] ICML author notifications
    The author notifications for ICML submissions will be released 14 May (Anywhere on Earth time). I thought to start a discussion thread here so we can discuss any issues and share commiserations/celebrations. Note the release date means it could be as late as end of day on 14 May Anywhere on Earth time, effectively meaning 15 or even 16 May for some people, depending on where you live and if there are delays. For reference: the notification of phase 1 rejections were delayed by some hours (around 6 or 12, but not more than 24 iirc); and the author feedback period seemed to open on time, maybe even early. I think the main thing to remember is that no matter the result, your work still has value and you still have value. You are not your work, and any evaluation on the value of your work is just a few people's opinions in a noisy, messy system. You can check Anywhere on Earth time here: https://time.is/Anywhere_on_Earth submitted by /u/tfburns [link] [comments]  ( 3 min )
    [P] Identifying if there are any clusters in a data warehouse
    Hi, I have a lot of structured data that have points that should and are random (i.e we don’t want a cluster of points in space but are and should be sparse). The data is basically a dataset that had leak issues. The problem statement is find if there’s any subset of data with the same root cause (i.e same brand or same country caused the issue) There’s a mixture of numerical and categorical data. (I.e size, lat, long), (country, brand, priority, city). Assume there’s around 20+ dimensions (columns) and my first approach was to find if those points have any cluster. To do so, I was going to use a density based cluster algorithm since you don’t have to specify the number of clusters and it will just ignore noise (whichbshould be most of the points). But hard to preserve the meaning of the columns with dimensionality reduction and obviously we can’t cluster in 20 dimensions, it would take forever. We don’t know what could cause the leaks, and we don’t know which column is important or not. What’s an alternative approach or a better solution? Thanks ​ (Note my post got taken down so reposting) submitted by /u/micdean19 [link] [comments]  ( 2 min )
    How Useful Is Small Data Experimentation? [Discussion]
    Hey all, I'm a front end engineer with an interest in building user interfaces for machine learning tools. I'm wanting to build a library for creating web user interfaces that leverage numpy, scipy, sklearn, and any other data science library supported by Pyodide (a tool for running Python in the web browser). The advantage is that there's no environment setup, can run anywhere, and that complex user interfaces can be built easily using existing web frameworks (like React, Vue, etc). The limitations of this is that since it's running in the browser, runtimes will be longer (1-12x longer according to documentations), and working with bigger data sets may not be feasible. Also, we'd have access to only a few of the data science libraries available in Python (mentioned above). So, really,…  ( 2 min )
  • Open

    Gato & AGI doubts
    Just read https://thenextweb.com/news/deepminds-astounding-new-gato-ai-makes-fear-humans-will-never-achieve-agi? I didn't read the whole article but based off the title I'm guessing the author of the article doubts AGI will happen because of what he sees with Gato? Why would he think that? Maybe I'm missing something submitted by /u/Ashamed-Asparagus-93 [link] [comments]  ( 1 min )
    Artificial Intelligence Books: These 10 Sci-Fi Novels You Must Read
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    New ML tool to help data scientists manage cloud workloads with Terraform
    submitted by /u/thumbsdrivesmecrazy [link] [comments]
    GALLERY
    submitted by /u/cookingandcraft [link] [comments]
    Artificial Intelligence Implications: The Future of Formula One | This is my second post in a university blog series surrounding AI, The Future, and a personal area of interest! Would love some feedback and advice for future episodes!
    submitted by /u/RvZz11 [link] [comments]  ( 1 min )
    Seeking the perfect song recommendation method
    I hope you are doing as fine as you can. I am investigating and researching solutions to a problem for a few months now and I need your valuable help today. Problematic: Generate a playlist based on multiple users' music tastes. The goal is that the playlist suits everyone, at the higher level of satisfaction for every user. Research path Spotify and recommendation engines I arrived at the following conclusion: the best way to achieve it is to build a music recommendation engine, but of course, you don't want to create it by yourself because it will need a huge amount of data that I can't provide nor find, so I need to work with a third-party service that already has this data and recommendation engine, and it seems like there are not many companies that offer that service and that'…  ( 3 min )
    Hello, if i can have a minute of your time to answer some question about A.I and its uses in cyber crime hope you will help me with this one.
    submitted by /u/SpoiledCheweez [link] [comments]  ( 1 min )
    ai is making ai jokes about ai with ai voice...
    submitted by /u/Alive_Ad_2882 [link] [comments]
    DeepMind’s new AI "Gato" can do 604 tasks e.g play Atari, control Robots, chat, caption Images, etc
    submitted by /u/qptbook [link] [comments]
    AI vs AI - Two AI Bots Chatting
    submitted by /u/Alive_Ad_2882 [link] [comments]  ( 1 min )
    Microsoft AI Team Proposes Floquet Codes: A New Class Of Quantum Error Correction Codes That Are Well Suited To Topological Qubits
    The Microsoft Azure Quantum program is built on technological advancements that enable quantum computing to scale. Microsoft announced the notion of topological qubits in March 2022, which are qubits that are theoretically more stable than existing ones without sacrificing size or speed. However, developing a general-purpose quantum computer capable of solving industrial-scale issues will necessitate innovation at all levels of the quantum stack, from nanoscale materials to algorithms and applications. Scientists state that quantum states are inherently fragile and are quickly destroyed when a qubit is coupled to its surroundings. This results in noise, developing the quantum computer a challenging task. Error correction, which is also utilized in traditional digital computing, is a critical technology for overcoming this fragility. Quantum error correction (QEC) can detect and fix most faults that occur on physical qubits by encoding the state of a single logical qubit into multiple physical qubits. However, depending on the quality of the physical qubits, error correction might raise a computation’s space requirements by a factor of thousands. In addition, its time requirements by more than tenfold. As a result, any improvements in mistake correction have a huge positive impact throughout the stack. Continue Reading Paper: https://arxiv.org/pdf/2202.11829.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    Sampling with replacement until you’ve seen everything
    Suppose you have a standard deck of 52 cards. You pull out a card, put it back in the deck, shuffle, and pull out another card. How long would you expect to do this until you’ve seen every card? Here’s a variation on the same problem. Suppose you’re a park ranger keeping data on tagged […] Sampling with replacement until you’ve seen everything first appeared on John D. Cook.  ( 3 min )

  • Open

    Last Week in AI: FDA Clearances, Firing at Google AI, AI for Apple Watch, AI Reviews Beer and Wine
    submitted by /u/regalalgorithm [link] [comments]
    The most effective method to befuddle AI models
    submitted by /u/p0goniphaft111 [link] [comments]
    amazing
    submitted by /u/Murat1_31 [link] [comments]
    AI News | New Intel Gaudi2 & Greco AI Deep Learning Processors | AI Predicts Crohn Disease | AI & Ocean Waves
    submitted by /u/SlightSituation [link] [comments]
    Rendering 3D objects using differentiable SDFs
    submitted by /u/imapurplemango [link] [comments]  ( 1 min )
    Gato: A single Transformer to RuLe them all! (Deepmind's new model)
    submitted by /u/OnlyProggingForFun [link] [comments]
    China Says It’s 3D Printing a 590-Foot Hydroelectric Dam With Zero Human Labor
    submitted by /u/estasfuera [link] [comments]
    Intel’s Habana Labs Introduces Second-Generation AI Deep Learning Processors
    https://preview.redd.it/8ftfke16f7z81.jpg?width=8688&format=pjpg&auto=webp&s=4429ed198c9d0adc06cd0e70a5dda54a1bc070d8 Today, Intel’s Habana Labs team announced two critical new products: Gaudi2, the 2nd iteration of the Gaudi DL training processor, and Greco, the successor to the Goya DL inference processor. Intel’s CPUs are significantly faster than their predecessors and competitors. While Gaudi and Goya are the first new chips launched by Habana Labs following its acquisition by Intel. Gaudi2 and Greco switched from a 16nm to a 7nm technology (via TSMC, the manufacturer). The 10 Tensor processing cores present in the Gaudi training processor have been increased to 24. Also, the in-package memory capacity has tripled from 32GB (HBM2) to 96GB (HBM2E). The onboard SRAM has been increased from 24MB to 48MB. Gaudi2 is the first and only accelerator with such an amount of memory. The CPU has a TDP of 600W (vs. Gaudi – 350W), although it is claimed that it still employs passive cooling and does not require liquid cooling. Continue Reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Meta AI Introduces ‘Make-A-Scene’: A Deep Generative Technique Based On An Autoregressive Transformer For Text-To-Image Synthesis With Human Priors
    In recent years, the research related to text-to-image generation has been growing exponentially. Nevertheless, the current methods still lack at least three essential characteristics. First of all, most models accept as input solely the text information. This is a massive limitation, as the controllability of the model is limited to style or color, but it can not be extended to structure or form, for example. The second limitation is related to human perception: indeed, the final aim of these models is to match human perception and attention, but, in reality, the generation process does not include any relevant prior knowledge on this. For example, the losses which control the generation are usually applied to the whole image without adding a specific focus on parts fundamental to human perception (such as human faces, animals, or salient objects). The last missing characteristic is the always-present problem of quality and resolution, as most of the works are limited to an output resolution of 256×256. For these reasons, the team of Facebook AI has introduced Make-A-Scene. This novel method successfully tackled these three gaps while attaining SOTA results for text-to-image generation. The proposed model is essentially three encoders with discrete tokens, an auto-regressive transformer that learns to generate sequences of tokens conditioned on the scene segmentation and a decoder that generates images from this transformer-generated sequence. It is important to note that the network does not use the segmented scene for computing the loss, thus, the segmentation is not necessary at inference time. The model is resumed in the figure below. Continue Reading This Summary Paper: https://arxiv.org/pdf/2203.13131v1.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    BlobGAN: A GAN model that uses simple blobs to manipulate objects in images
    submitted by /u/OnlyProggingForFun [link] [comments]
  • Open

    Enhance the caller experience with hints in Amazon Lex
    We understand speech input better if we have some background on the topic of conversation. Consider a customer service agent at an auto parts wholesaler helping with orders. If the agent knows that the customer is looking for tires, they’re more likely to recognize responses (for example, “Michelin”) on the phone. Agents often pick up […]  ( 5 min )
    Run automatic model tuning with Amazon SageMaker JumpStart
    In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). In March 2022, we also announced the support for APIs in JumpStart. JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across […]  ( 7 min )
  • Open

    [R] Scaling tasks during pre-training may be > Scaling Model Parameters
    https://arxiv.org/pdf/2201.06910.pdf ...This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; i.e., the model size has little impact on performance with an extremely large number of tasks. Our results show that task scaling can substantially improve training efficiency by 30 times in FLOPs.. tl;dr scale the amount of tasks as well as data, compute, hyperparameters, FLOPs and prayers for effectively training LLMs. submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 1 min )
    [Discussion] We tried learning AI from games. How about learning from players?
    I wrote a post arguing that video games are more relevant than ever for AI research. Essentially, RL is at an impasse, and all the really impressive progress comes from self-supervised learning. Could we learn behavior foundation models from millions of traces of real humans playing real games, and would this get us more general and more real intelligence? https://modl.ai/learning-ai-from-players/ submitted by /u/togelius [link] [comments]  ( 2 min )
    [P] I was tired of screenshotting plots in Jupyter to share my results. Wanted something better, information rich. So I built a new %%share magic that freezes a cell, captures its code, output & data and returns a URL for sharing.
    ​ https://reddit.com/link/uosqgm/video/pxk7h4jb49z81/player You can try it out in Colab here: https://colab.research.google.com/drive/1E5oU6TjH6OocmvEfU-foJfvCTbTfQrqd?usp=sharing#scrollTo=cVxS_6rBmLKW To install: pip install thousandwords Then in Jupyter Notebook: from thousandwords import share Then: %%share # Your Python code goes here.. More details: https://docs.1000words-hq.com/docs/python-sdk/share Source: https://github.com/edouard-g/thousandwords Homepage: https://1000words-hq.com ------------------------------- EDIT: Thanks for upvotes and the feedback. People have voiced their concerns of inadvertent data leaks, and that the Python package wasn't doing enough to warn the user ahead of time. As a short-term mitigation, I've pushed an update. The %%share magic now warns the user about exactly what gets shared and requires manual confirmation (details below). We'll be looking into building an option to share privately. Feel free to ping me for questions/concerns. More details on the mitigation: from thousandwords import share x = 1 Then: In [3]: %%share ...: print(x) This will upload 'x' server-side. Anyone with the link will have read access. Do you wish to proceed ? [y/N] ​ submitted by /u/Left_Ad8361 [link] [comments]  ( 5 min )
    [R] Clinical Prompt Learning with Frozen Language Models
    Hi everyone! We are a team of computational and clinical scientists for Chronosig Project, Oxford. We are excited to share our work on prompt learning for clinical support tasks. A major concern in the field of NLP is that even the largest pre-trained language models, such as GPT-3, do not perform well in specialized domains such as clinical texts. Typically adapting large PLMs to new domains requires fine-tuning entire models on new texts and downstream tasks, which can require entire fleets of high-end GPUs. Prompt learning is a new paradigm of NLP to fully utilize the pretrained language model based on its pretraining task and can require substantially fewer training parameters given a new task. In this work, we have investigated the feasibility of using prompt learning with small or medium domain trained frozen language models versus "a traditional" fine-tuning on clinical tasks using MIMIC-III data with full-data and few-shot settings. We also develop a new clinical triage task based on the ICD-9 codes, available for future use. We demonstrated that prompt learning requires fewer tuned parameters to match and even outperform traditional fine-tuning in few-shot settings. We actually observed prompt learning could match the performance of traditional fine-tuning with 1000 times fewer trained parameters. This is particularly encouraging for high-stake domains, such as medicine, where high-quality annotated data is scarce. We hope prompt tuning can offer a competitive alternative to a fine-tuning of large language models in the medical domain and act as a framework moving forward. Arxiv: https://arxiv.org/abs/2205.05535 GitHub: https://github.com/NtaylorOX/Public_Clinical_Prompt EDIT: For someone who may need TL;DR: we are creating a blog post, and we hope it will be released by next week! ​ ​ An illustration of the workflow of CPL submitted by /u/YiTsubasa [link] [comments]  ( 2 min )
    [D] ZIP models as a means to handle regression on data with excess of zeros
    Hi Everyone, Sharing an article on how to handle regression for data which has lots and lots of zeros. VevestaX/ZIP_tutorial.md at main · Vevesta/VevestaX · GitHub This article has been written based on my personal experience. While using regression on this peculiar problem on kaggle I was confounded by excessive 0 in the dependent variable. The reason why models like regression and Poisson failed wasn't immediately clear to me. On digging I found less known models - Hurdle and ZIP models are specially fine-tuned for such a problem. Happy to hear feedback on the article. Please let me know if you know other techniques to handle this problem of excessive 0 in dependent variable. submitted by /u/vevesta [link] [comments]  ( 1 min )
    [D] System Requirement to host a finetuned GPT Neo X 20B model
    GPT 3 Davinci with 175B parameters costs $0.0600 / 1K tokens Their cheaper model Curie with 13B parameters costs $0.0060 / 1K tokens ​ GPT Neo X has 20B parameters... ​ But on so many sites, GPT Neo X with 20B parameters cost more than GPT 3 Davinci with 175B parameters... ​ For example on NLPCloud, GPT Neo X costs $0.095/1K tokens. ​ and on Goose AI, it costs 0.063 / 1K tokens ​ Quality difference between Davinci and Neo X is huge, but why price is higher than Davinci? ​ So I was thinking about hosting it on my own custom server. ​ I know finetuning it gonna require a lot of resources but it's only one time thing. So I don't really mind it that much... But can you please give me any estimated information about how much is it gonna cost or what's system requirements to host a already finetuned Neo X model on a server? Btw my usage is to process around 200,000 tokens per day ( ~1000-2000 requests ) submitted by /u/Homeless_Programmer [link] [comments]  ( 1 min )
    [D] word misleading classification model
    In my text classification a particular word misleading the model. But these words are very high in the training data for a particular lable. Eg: i have a training data contains " lost my phone", "changed my phone", .... All these labels are belongs to " problem with telephone" class. Now, I am using Universal sentence encoder to build the model. During inference if i have given some random sentences and put the word " phone" in the middle, But still my model is predicting "problem with telephone" class with higher confidence. How should we handle these situation. submitted by /u/NSVR57 [link] [comments]  ( 1 min )
    [D] Anomaly detection - deviation from normal
    Hi everyone, I am collecting temperature data from a sensor measuring the temperature of a process. The temperature raises to go in steady-state during operation and it cools down when the process is stopped. I would like to set a simple threshold notifying me when the temperature during operation is higher or lower than expected, how should I tackle the following: Filter temperatures when the process is warming up (non-steady state) How much data should I use to define the threshold? More like a moving window (median, MAD, ...) or a mean reversion on all historical data? I am measuring the room temeprature with another sensor, how do I remove seasonality in your opinion with the second sensor's temperature data? (substraction, division, ...) How to update the thresholds with new data collected? Thank you all. submitted by /u/No-Way3852 [link] [comments]  ( 1 min )
  • Open

    Startups using Reinforcement Learning
    What are the upcoming startups that are based on RL? I know covariant.ai and kindred.ai use RL and are there any other RL-based startups submitted by /u/blitzkreig3 [link] [comments]
    Flattening the layers that preprocess the observation space
    In this paper, the observation space is a dictionary that has four keys: - Image - Direction of the agent - Position of the agent - Recurrent state of the policy From line 237 to line 248 (https://github.com/google-research/google-research/blob/98b89bd2d67c7944a8ac381e0417d4c20a6c87ee/social_rl/multiagent_tfagents/joint_attention/attention_networks.py#L237), a few preprocessing layers to process those four tensors of the obs space are built. What I don't understand is what happens after that: preprocessing_nest = tf.nest.map_structure(lambda l: None, preprocessing_layers) flat_observation_spec = nest_utils.flatten_up_to( preprocessing_nest, observation_spec, ) If I understand correctly, a lambda None function is applied to each of those preprocessing layers. This is then flattened up to the observation space, to retain some structure. I don't understand what this is exactly doing. Can someone explain? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Which learning algorithms count as "Deep RL"?
    Deep Q-Networks are of course Deep RL. Is PPO another example? How about A2C/A3C/SAC? I don't know enough about these algorithms yet to decide for myself and I'm struggling to find adequate information online. Thanks in advance! submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    In how many steps should we update our networks in DDPG?
    Correct me if I am wrong but in the original DDPG paper, I saw that update for actor and critic networks was inside the step loop (of an episode). But this will very expensive computationally. So, in how many steps or global_steps should I update my models? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Q-Learning Example Tutorial (w/ Q-table & Bellman equation)
    submitted by /u/lukenewmann1 [link] [comments]
    Cannot find where’s the in-place operation error.
    I'm trying to train the MADDPG model; however, it has occurred an in-place operation error. Here's the traceback, and I've taken some excerpts that I think are critical: ​ RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 2]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! ​ [W ..\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in AddmmBackward0. Traceback of forward call that caused the error: ... c_loss, a_loss = all_agents.learn(memory) File "C:\Users\chhu…  ( 2 min )
    Gato: A single Transformer to RuLe them all! (Deepmind's new model)
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Need some help for recommendation system
    Hello. As a part of a short internship (around a month long), I am required to create a recommendation system for a eCommerce. I was planning to build some intuition how state transfer will take place, how state will look. Can you help me in deciding environment, state, reward, policy for eCommerce setting. Dataset - https://tianchi.aliyun.com/competition/entrance/231721/information?lang=en-us submitted by /u/kachua26 [link] [comments]  ( 2 min )
    "Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs", Akin et al 2022 {G}
    submitted by /u/gwern [link] [comments]  ( 1 min )
  • Open

    New AI Deep Learning Processors | Intel's Gaudi2 & Greco
    submitted by /u/getrich_or_diemining [link] [comments]
  • Open

    Broom, Broom: WeRide Revs Up Self-Driving Street Sweepers Powered by NVIDIA
    When it comes to safety, efficiency and sustainability, autonomous vehicles are delivering a clean sweep. Autonomous vehicle company and NVIDIA Inception member WeRide this month began a public road pilot of its Robo Street Sweepers. The vehicles, designed to perform round-the-clock cleaning services, are built on the high-performance, energy-efficient compute of NVIDIA. The fleet of Read article > The post Broom, Broom: WeRide Revs Up Self-Driving Street Sweepers Powered by NVIDIA appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    A Beginners Guide to RPA ‘ Bots’
    Enterprises burdened with the processes that are repetitive, time-consuming & mundane are constantly on the lookout for new and innovative…  ( 4 min )
  • Open

    How Does Knowledge Graph Embedding Extrapolate to Unseen Data: A Semantic Evidence View. (arXiv:2109.11800v3 [cs.CL] UPDATED)
    Knowledge Graph Embedding (KGE) aims to learn representations for entities and relations. Most KGE models have gained great success, especially on extrapolation scenarios. Specifically, given an unseen triple (h, r, t), a trained model can still correctly predict t from (h, r, ?), or h from (?, r, t), such extrapolation ability is impressive. However, most existing KGE works focus on the design of delicate triple modeling function, which mainly tells us how to measure the plausibility of observed triples, but offers limited explanation of why the methods can extrapolate to unseen data, and what are the important factors to help KGE extrapolate. Therefore in this work, we attempt to study the KGE extrapolation of two problems: 1. How does KGE extrapolate to unseen data? 2. How to design the KGE model with better extrapolation ability? For the problem 1, we first discuss the impact factors for extrapolation and from relation, entity and triple level respectively, propose three Semantic Evidences (SEs), which can be observed from train set and provide important semantic information for extrapolation. Then we verify the effectiveness of SEs through extensive experiments on several typical KGE methods. For the problem 2, to make better use of the three levels of SE, we propose a novel GNN-based KGE model, called Semantic Evidence aware Graph Neural Network (SE-GNN). In SE-GNN, each level of SE is modeled explicitly by the corresponding neighbor pattern, and merged sufficiently by the multi-layer aggregation, which contributes to obtaining more extrapolative knowledge representation. Finally, through extensive experiments on FB15k-237 and WN18RR datasets, we show that SE-GNN achieves state-of-the-art performance on Knowledge Graph Completion task and performs a better extrapolation ability. Our code is available at https://github.com/renli1024/SE-GNN.  ( 3 min )
    Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation. (arXiv:2205.06058v1 [cs.IR])
    News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session. Previous works tend to formulate session-based recommendation as a next item prediction task, while they neglect the implicit feedback from user behaviors, which indicates what users really like or dislike. Hence, we propose a comprehensive framework to model user behaviors through positive feedback (i.e., the articles they spend more time on) and negative feedback (i.e., the articles they choose to skip without clicking in). Moreover, the framework implicitly models the user using their session start time, and the article using its initial publishing time, in what we call "neutral feedback". Empirical evaluation on three real-world news datasets shows the framework's promising performance of more accurate, diverse and even unexpectedness recommendations than other state-of-the-art session-based recommendation approaches.  ( 2 min )
    RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control. (arXiv:2012.03094v3 [cs.RO] UPDATED)
    We present a unified model-based and data-driven approach for quadrupedal planning and control to achieve dynamic locomotion over uneven terrain. We utilize on-board proprioceptive and exteroceptive feedback to map sensory information and desired base velocity commands into footstep plans using a reinforcement learning (RL) policy. This RL policy is trained in simulation over a wide range of procedurally generated terrains. When ran online, the system tracks the generated footstep plans using a model-based motion controller. We evaluate the robustness of our method over a wide variety of complex terrains. It exhibits behaviors which prioritize stability over aggressive locomotion. Additionally, we introduce two ancillary RL policies for corrective whole-body motion tracking and recovery control. These policies account for changes in physical parameters and external perturbations. We train and evaluate our framework on a complex quadrupedal system, ANYmal version B, and demonstrate transferability to a larger and heavier robot, ANYmal C, without requiring retraining.  ( 2 min )
    A BIC-based Mixture Model Defense against Data Poisoning Attacks on Classifiers. (arXiv:2105.13530v2 [cs.LG] UPDATED)
    Data Poisoning (DP) is an effective attack that causes trained classifiers to misclassify their inputs. DP attacks significantly degrade a classifier's accuracy by covertly injecting attack samples into the training set. Broadly applicable to different classifier structures, without strong assumptions about the attacker, an {\it unsupervised} Bayesian Information Criterion (BIC)-based mixture model defense against "error generic" DP attacks is herein proposed that: 1) addresses the most challenging {\it embedded} DP scenario wherein, if DP is present, the poisoned samples are an {\it a priori} unknown subset of the training set, and with no clean validation set available; 2) applies a mixture model both to well-fit potentially multi-modal class distributions and to capture poisoned samples within a small subset of the mixture components; 3) jointly identifies poisoned components and samples by minimizing the BIC cost defined over the whole training set, with the identified poisoned data removed prior to classifier training. Our experimental results, for various classifier structures and benchmark datasets, demonstrate the effectiveness and universality of our defense under strong DP attacks, as well as its superiority over other works.  ( 2 min )
    Ensemble Clustering via Co-association Matrix Self-enhancement. (arXiv:2205.05937v1 [cs.LG])
    Ensemble clustering integrates a set of base clustering results to generate a stronger one. Existing methods usually rely on a co-association (CA) matrix that measures how many times two samples are grouped into the same cluster according to the base clusterings to achieve ensemble clustering. However, when the constructed CA matrix is of low quality, the performance will degrade. In this paper, we propose a simple yet effective CA matrix self-enhancement framework that can improve the CA matrix to achieve better clustering performance. Specifically, we first extract the high-confidence (HC) information from the base clusterings to form a sparse HC matrix. By propagating the highly-reliable information of the HC matrix to the CA matrix and complementing the HC matrix according to the CA matrix simultaneously, the proposed method generates an enhanced CA matrix for better clustering. Technically, the proposed model is formulated as a symmetric constrained convex optimization problem, which is efficiently solved by an alternating iterative algorithm with convergence and global optimum theoretically guaranteed. Extensive experimental comparisons with twelve state-of-the-art methods on eight benchmark datasets substantiate the effectiveness, flexibility and efficiency of the proposed model in ensemble clustering. The codes and datasets can be downloaded at https://github.com/Siritao/EC-CMS.  ( 2 min )
    Contingency-constrained economic dispatch with safe reinforcement learning. (arXiv:2205.06212v1 [eess.SY])
    Future power systems will rely heavily on micro grids with a high share of decentralised renewable energy sources and energy storage systems. The high complexity and uncertainty in this context might make conventional power dispatch strategies infeasible. Reinforcement-learning based (RL) controllers can address this challenge, however, cannot themselves provide safety guarantees, preventing their deployment in practice. To overcome this limitation, we propose a formally validated RL controller for economic dispatch. We extend conventional constraints by a time-dependent constraint encoding the islanding contingency. The contingency constraint is computed using set-based backwards reachability analysis and actions of the RL agent are verified through a safety layer. Unsafe actions are projected into the safe action space while leveraging constrained zonotope set representations for computational efficiency. The developed approach is demonstrated on a residential use case using real-world measurements.  ( 2 min )
    Maximum sampled conditional likelihood for informative subsampling. (arXiv:2011.05988v3 [math.ST] UPDATED)
    Subsampling is a computationally effective approach to extract information from massive data sets when computing resources are limited. After a subsample is taken from the full data, most available methods use an inverse probability weighted (IPW) objective function to estimate the model parameters. The IPW estimator does not fully utilize the information in the selected subsample. In this paper, we propose to use the maximum sampled conditional likelihood estimator (MSCLE) based on the sampled data. We established the asymptotic normality of the MSCLE and prove that its asymptotic variance covariance matrix is the smallest among a class of asymptotically unbiased estimators, including the IPW estimator. We further discuss the asymptotic results with the L-optimal subsampling probabilities and illustrate the estimation procedure with generalized linear models. Numerical experiments are provided to evaluate the practical performance of the proposed method.  ( 2 min )
    Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results. (arXiv:2204.03475v2 [cs.CV] UPDATED)
    ImageNet serves as the primary dataset for evaluating the quality of computer-vision models. The common practice today is training each architecture with a tailor-made scheme, designed and tuned by an expert. In this paper, we present a unified scheme for training any backbone on ImageNet. The scheme, named USI (Unified Scheme for ImageNet), is based on knowledge distillation and modern tricks. It requires no adjustments or hyper-parameters tuning between different models, and is efficient in terms of training times. We test USI on a wide variety of architectures, including CNNs, Transformers, Mobile-oriented and MLP-only. On all models tested, USI outperforms previous state-of-the-art results. Hence, we are able to transform training on ImageNet from an expert-oriented task to an automatic seamless routine. Since USI accepts any backbone and trains it to top results, it also enables to perform methodical comparisons, and identify the most efficient backbones along the speed-accuracy Pareto curve. Implementation is available at:https://github.com/Alibaba-MIIL/Solving_ImageNet  ( 2 min )
    Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction. (arXiv:2205.06119v1 [cs.CL])
    This paper investigates the effectiveness of sentence-level transformers for zero-shot offensive span identification on a code-mixed Tamil dataset. More specifically, we evaluate rationale extraction methods of Local Interpretable Model Agnostic Explanations (LIME) \cite{DBLP:conf/kdd/Ribeiro0G16} and Integrated Gradients (IG) \cite{DBLP:conf/icml/SundararajanTY17} for adapting transformer based offensive language classification models for zero-shot offensive span identification. To this end, we find that LIME and IG show baseline $F_{1}$ of 26.35\% and 44.83\%, respectively. Besides, we study the effect of data set size and training process on the overall accuracy of span identification. As a result, we find both LIME and IG to show significant improvement with Masked Data Augmentation and Multilabel Training, with $F_{1}$ of 50.23\% and 47.38\% respectively. \textit{Disclaimer : This paper contains examples that may be considered profane, vulgar, or offensive. The examples do not represent the views of the authors or their employers/graduate schools towards any person(s), group(s), practice(s), or entity/entities. Instead they are used to emphasize only the linguistic research challenges.}  ( 2 min )
    How I failed machine learning in medical imaging -- shortcomings and recommendations. (arXiv:2103.10292v2 [eess.IV] UPDATED)
    Medical imaging is an important research field with many opportunities for improving patients' health. However, there are a number of challenges that are slowing down the progress of the field as a whole, such optimizing for publication. In this paper we reviewed several problems related to choosing datasets, methods, evaluation metrics, and publication strategies. With a review of literature and our own analysis, we show that at every step, potential biases can creep in. On a positive note, we also see that initiatives to counteract these problems are already being started. Finally we provide a broad range of recommendations on how to further these address problems in the future. For reproducibility, data and code for our analyses are available on \url{https://github.com/GaelVaroquaux/ml_med_imaging_failures}  ( 2 min )
    Framework for inferring empirical causal graphs from binary data to support multidimensional poverty analysis. (arXiv:2205.06131v1 [stat.ME])
    Poverty is one of the fundamental issues that mankind faces. Multidimensional Poverty Index (MPI) is deployed for measuring poverty issues in a population beyond monetary. However, MPI cannot provide information regarding associations and causal relations among poverty factors. Does education cause income inequality in a specific region? Is lacking education a cause of health issues? By not knowing causal relations, policy maker cannot pinpoint root causes of poverty issues of a specific population, which might not be the same across different population. Additionally, MPI requires binary data, which cannot be analyzed by most of causal inference frameworks. In this work, we proposed an exploratory-data-analysis framework for finding possible causal relations with confidence intervals among binary data. The proposed framework provides not only how severe the issue of poverty is, but it also provides the causal relations among poverty factors. Moreover, knowing a confidence interval of degree of causal direction lets us know how strong a causal relation is. We evaluated the proposed framework with several baseline approaches in simulation datasets as well as using two real-world datasets as case studies 1) Twin births of the United States: the relation between birth weight and mortality of twin, and 2) Thailand population surveys from 378k households of Chiang Mai and 353k households of Khon Kaen provinces. Our framework performed better than baselines in most cases. The first case study reveals almost all mortality cases in twins have issues of low birth weights but not all low-birth-weight twins were died. The second case study reveals that smoking associates with drinking alcohol in both provinces and there is a causal relation of smoking causes drinking alcohol in only Chiang Mai province. The framework can be applied beyond the poverty context.  ( 3 min )
    Machine Learning Workflow to Explain Black-box Models for Early Alzheimer's Disease Classification Evaluated for Multiple Datasets. (arXiv:2205.05907v1 [cs.LG])
    Purpose: Hard-to-interpret Black-box Machine Learning (ML) were often used for early Alzheimer's Disease (AD) detection. Methods: To interpret eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM) black-box models a workflow based on Shapley values was developed. All models were trained on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and evaluated for an independent ADNI test set, as well as the external Australian Imaging and Lifestyle flagship study of Ageing (AIBL), and Open Access Series of Imaging Studies (OASIS) datasets. Shapley values were compared to intuitively interpretable Decision Trees (DTs), and Logistic Regression (LR), as well as natural and permutation feature importances. To avoid the reduction of the explanation validity caused by correlated features, forward selection and aspect consolidation were implemented. Results: Some black-box models outperformed DTs and LR. The forward-selected features correspond to brain areas previously associated with AD. Shapley values identified biologically plausible associations with moderate to strong correlations with feature importances. The most important RF features to predict AD conversion were the volume of the amygdalae, and a cognitive test score. Good cognitive test performances and large brain volumes decreased the AD risk. The models trained using cognitive test scores significantly outperformed brain volumetric models ($p<0.05$). Cognitive Normal (CN) vs. AD models were successfully transferred to external datasets. Conclusion: In comparison to previous work, improved performances for ADNI and AIBL were achieved for CN vs. Mild Cognitive Impairment (MCI) classification using brain volumes. The Shapley values and the feature importances showed moderate to strong correlations.  ( 2 min )
    Sample Complexity Bounds for Robustly Learning Decision Lists against Evasion Attacks. (arXiv:2205.06127v1 [cs.LG])
    A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks. In this paper we address this issue within the framework of PAC learning, focusing on the class of decision lists. Given that distributional assumptions are essential in the adversarial setting, we work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability. Our key results illustrate that the adversary's budget (that is, the number of bits it can perturb on each input) is a fundamental quantity in determining the sample complexity of robust learning. Our first main result is a sample-complexity lower bound: the class of monotone conjunctions (essentially the simplest non-trivial hypothesis class on the Boolean hypercube) and any superclass has sample complexity at least exponential in the adversary's budget. Our second main result is a corresponding upper bound: for every fixed $k$ the class of $k$-decision lists has polynomial sample complexity against a $\log(n)$-bounded adversary. This sheds further light on the question of whether an efficient PAC learning algorithm can always be used as an efficient $\log(n)$-robust learning algorithm under the uniform distribution.  ( 2 min )
    Over-the-Air Federated Learning with Joint Adaptive Computation and Power Control. (arXiv:2205.05867v1 [cs.LG])
    This paper considers over-the-air federated learning (OTA-FL). OTA-FL exploits the superposition property of the wireless medium, and performs model aggregation over the air for free. Thus, it can greatly reduce the communication cost incurred in communicating model updates from the edge devices. In order to fully utilize this advantage while providing comparable learning performance to conventional federated learning that presumes model aggregation via noiseless channels, we consider the joint design of transmission scaling and the number of local iterations at each round, given the power constraint at each edge device. We first characterize the training error due to such channel noise in OTA-FL by establishing a fundamental lower bound for general functions with Lipschitz-continuous gradients. Then, by introducing an adaptive transceiver power scaling scheme, we propose an over-the-air federated learning algorithm with joint adaptive computation and power control (ACPC-OTA-FL). We provide the convergence analysis for ACPC-OTA-FL in training with non-convex objective functions and heterogeneous data. We show that the convergence rate of ACPC-OTA-FL matches that of FL with noise-free communications.  ( 2 min )
    Ethereum Fraud Detection with Heterogeneous Graph Neural Networks. (arXiv:2203.12363v2 [cs.LG] UPDATED)
    While transactions with cryptocurrencies such as Ethereum are becoming more prevalent, fraud and other criminal transactions are not uncommon. Graph analysis algorithms and machine learning techniques detect suspicious transactions that lead to phishing in large transaction networks. Many graph neural network (GNN) models have been proposed to apply deep learning techniques to graph structures. Although there is research on phishing detection using GNN models in the Ethereum transaction network, models that address the scale of the number of vertices and edges and the imbalance of labels have not yet been studied. In this paper, we compared the model performance of GNN models on the actual Ethereum transaction network dataset and phishing reported label data to exhaustively compare and verify which GNN models and hyperparameters produce the best accuracy. Specifically, we evaluated the model performance of representative homogeneous GNN models which consider single-type nodes and edges and heterogeneous GNN models which support different types of nodes and edges. We showed that heterogeneous models had better model performance than homogeneous models. In particular, the RGCN model achieved the best performance in the overall metrics.  ( 2 min )
    Anomaly Detection of Adversarial Examples using Class-conditional Generative Adversarial Networks. (arXiv:2105.10101v2 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) have been shown vulnerable to Test-Time Evasion attacks (TTEs, or adversarial examples), which, by making small changes to the input, alter the DNN's decision. We propose an unsupervised attack detector on DNN classifiers based on class-conditional Generative Adversarial Networks (GANs). We model the distribution of clean data conditioned on the predicted class label by an Auxiliary Classifier GAN (AC-GAN). Given a test sample and its predicted class, three detection statistics are calculated based on the AC-GAN Generator and Discriminator. Experiments on image classification datasets under various TTE attacks show that our method outperforms previous detection methods. We also investigate the effectiveness of anomaly detection using different DNN layers (input features or internal-layer features) and demonstrate, as one might expect, that anomalies are harder to detect using features closer to the DNN's output layer.  ( 2 min )
    Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling. (arXiv:2205.06024v1 [stat.ML])
    The Plackett-Luce (PL) model is ubiquitous in learning-to-rank (LTR) because it provides a useful and intuitive probabilistic model for sampling ranked lists. Counterfactual offline evaluation and optimization of ranking metrics are pivotal for using LTR methods in production. When adopting the PL model as a ranking policy, both tasks require the computation of expectations with respect to the model. These are usually approximated via Monte-Carlo (MC) sampling, since the combinatorial scaling in the number of items to be ranked makes their analytical computation intractable. Despite recent advances in improving the computational efficiency of the sampling process via the Gumbel top-k trick, the MC estimates can suffer from high variance. We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model by combining the Gumbel top-k trick with quasi-Monte Carlo (QMC) sampling, a well-established technique for variance reduction. We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.  ( 2 min )
    Multimodal Indoor Localisation for Measuring Mobility in Parkinson's Disease using Transformers. (arXiv:2205.06142v1 [cs.LG])
    Parkinson's disease (PD) is a slowly progressive debilitating neurodegenerative disease which is prominently characterised by motor symptoms. Indoor localisation, including number and speed of room to room transitions, provides a proxy outcome which represents mobility and could be used as a digital biomarker to quantify how mobility changes as this disease progresses. We use data collected from 10 people with Parkinson's, and 10 controls, each of whom lived for five days in a smart home with various sensors. In order to more effectively localise them indoors, we propose a transformer-based approach utilizing two data modalities, Received Signal Strength Indicator (RSSI) and accelerometer data from wearable devices, which provide complementary views of movement. Our approach makes asymmetric and dynamic correlations by a) learning temporal correlations at different scales and levels, and b) utilizing various gating mechanisms to select relevant features within modality and suppress unnecessary modalities. On a dataset with real patients, we demonstrate that our proposed method gives an average accuracy of 89.9%, outperforming competitors. We also show that our model is able to better predict in-home mobility for people with Parkinson's with an average offset of 1.13 seconds to ground truth.  ( 2 min )
    Improved Meta Learning for Low Resource Speech Recognition. (arXiv:2205.06182v1 [cs.CL])
    We propose a new meta learning based framework for low resource speech recognition that improves the previous model agnostic meta learning (MAML) approach. The MAML is a simple yet powerful meta learning approach. However, the MAML presents some core deficiencies such as training instabilities and slower convergence speed. To address these issues, we adopt multi-step loss (MSL). The MSL aims to calculate losses at every step of the inner loop of MAML and then combines them with a weighted importance vector. The importance vector ensures that the loss at the last step has more importance than the previous steps. Our empirical evaluation shows that MSL significantly improves the stability of the training procedure and it thus also improves the accuracy of the overall system. Our proposed system outperforms MAML based low resource ASR system on various languages in terms of character error rates and stable training behavior.  ( 2 min )
    SpecRepair: Counter-Example Guided Safety Repair of Deep Neural Networks. (arXiv:2106.01917v5 [cs.LG] UPDATED)
    Deep neural networks (DNNs) are increasingly applied in safety-critical domains, such as self-driving cars, unmanned aircraft, and medical diagnosis. It is of fundamental importance to certify the safety of these DNNs, i.e. that they comply with a formal safety specification. While safety certification tools exactly answer this question, they are of no help in debugging unsafe DNNs, requiring the developer to iteratively verify and modify the DNN until safety is eventually achieved. Hence, a repair technique needs to be developed that can produce a safe DNN automatically. To address this need, we present SpecRepair, a tool that efficiently eliminates counter-examples from a DNN and produces a provably safe DNN without harming its classification accuracy. SpecRepair combines specification-based counter-example search and resumes training of the DNN, penalizing counter-examples and certifying the resulting DNN. We evaluate SpecRepair's effectiveness on the ACAS Xu benchmark, a DNN-based controller for unmanned aircraft, and two image classification benchmarks. The results show that SpecRepair is more successful in producing safe DNNs than comparable methods, has a shorter runtime, and produces safe DNNs while preserving their classification accuracy.  ( 2 min )
    Ensemble Classifier Design Tuned to Dataset Characteristics for Network Intrusion Detection. (arXiv:2205.06177v1 [cs.CR])
    Machine Learning-based supervised approaches require highly customized and fine-tuned methodologies to deliver outstanding performance. This paper presents a dataset-driven design and performance evaluation of a machine learning classifier for the network intrusion dataset UNSW-NB15. Analysis of the dataset suggests that it suffers from class representation imbalance and class overlap in the feature space. We employed ensemble methods using Balanced Bagging (BB), eXtreme Gradient Boosting (XGBoost), and Random Forest empowered by Hellinger Distance Decision Tree (RF-HDDT). BB and XGBoost are tuned to handle the imbalanced data, and Random Forest (RF) classifier is supplemented by the Hellinger metric to address the imbalance issue. Two new algorithms are proposed to address the class overlap issue in the dataset. These two algorithms are leveraged to help improve the performance of the testing dataset by modifying the final classification decision made by three base classifiers as part of the ensemble classifier which employs a majority vote combiner. The proposed design is evaluated for both binary and multi-category classification. Comparing the proposed model to those reported on the same dataset in the literature demonstrate that the proposed model outperforms others by a significant margin for both binary and multi-category classification cases.
    Segmentation-Consistent Probabilistic Lesion Counting. (arXiv:2204.05276v2 [eess.IV] UPDATED)
    Lesion counts are important indicators of disease severity, patient prognosis, and treatment efficacy, yet counting as a task in medical imaging is often overlooked in favor of segmentation. This work introduces a novel continuously differentiable function that maps lesion segmentation predictions to lesion count probability distributions in a consistent manner. The proposed end-to-end approach--which consists of voxel clustering, lesion-level voxel probability aggregation, and Poisson-binomial counting--is non-parametric and thus offers a robust and consistent way to augment lesion segmentation models with post hoc counting capabilities. Experiments on Gadolinium-enhancing lesion counting demonstrate that our method outputs accurate and well-calibrated count distributions that capture meaningful uncertainty information. They also reveal that our model is suitable for multi-task learning of lesion segmentation, is efficient in low data regimes, and is robust to adversarial attacks.  ( 2 min )
    NER-MQMRC: Formulating Named Entity Recognition as Multi Question Machine Reading Comprehension. (arXiv:2205.05904v1 [cs.LG])
    NER has been traditionally formulated as a sequence labeling task. However, there has been recent trend in posing NER as a machine reading comprehension task (Wang et al., 2020; Mengge et al., 2020), where entity name (or other information) is considered as a question, text as the context and entity value in text as answer snippet. These works consider MRC based on a single question (entity) at a time. We propose posing NER as a multi-question MRC task, where multiple questions (one question per entity) are considered at the same time for a single text. We propose a novel BERT-based multi-question MRC (NER-MQMRC) architecture for this formulation. NER-MQMRC architecture considers all entities as input to BERT for learning token embeddings with self-attention and leverages BERT-based entity representation for further improving these token embeddings for NER task. Evaluation on three NER datasets show that our proposed architecture leads to average 2.5 times faster training and 2.3 times faster inference as compared to NER-SQMRC framework based models by considering all entities together in a single pass. Further, we show that our model performance does not degrade compared to single-question based MRC (NER-SQMRC) (Devlin et al., 2019) leading to F1 gain of +0.41%, +0.32% and +0.27% for AE-Pub, Ecommerce5PT and Twitter datasets respectively. We propose this architecture primarily to solve large scale e-commerce attribute (or entity) extraction from unstructured text of a magnitude of 50k+ attributes to be extracted on a scalable production environment with high performance and optimised training and inference runtimes.  ( 2 min )
    Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements. (arXiv:2205.06129v1 [stat.ML])
    Prediction of an individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two Census data problems that contribute to unsatisfactory predictive performance for minorities. First, the decennial Census often contains zero counts for minority racial groups in the Census blocks where some members of those groups reside. Second, because the Census surname files only include frequent names, many surnames -- especially those of minorities -- are missing from the list. To address the zero counts problem, we introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts by extending the na\"ive Bayesian inference of the BISG methodology to full posterior inference. To address the missing surname problem, we supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians. The proposed methodology, together with additional name data, is available via the open-source software package wru.  ( 2 min )
    The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning. (arXiv:2205.06226v1 [cs.LG])
    Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL) method by Grill et al. shows the negative term in contrastive loss can be removed if we add the so-called prediction head to the network. This initiated the research of non-contrastive self-supervised learning. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (stochastic) gradient descent can still learn competitive representations. This phenomenon is a typical example of implicit bias in deep learning and remains little understood. In this work, we present our empirical and theoretical discoveries on non-contrastive self-supervised learning. Empirically, we find that when the prediction head is initialized as an identity matrix with only its off-diagonal entries being trainable, the network can learn competitive representations even though the trivial optima still exist in the training objective. Theoretically, we present a framework to understand the behavior of the trainable, but identity-initialized prediction head. Under a simple setting, we characterized the substitution effect and acceleration effect of the prediction head. The substitution effect happens when learning the stronger features in some neurons can substitute for learning these features in other neurons through updating the prediction head. And the acceleration effect happens when the substituted features can accelerate the learning of other weaker features to prevent them from being ignored. These two effects enable the neural networks to learn all the features rather than focus only on learning the stronger features, which is likely the cause of the dimensional collapse phenomenon. To the best of our knowledge, this is also the first end-to-end optimization guarantee for non-contrastive methods using nonlinear neural networks with a trainable prediction head and normalization.  ( 2 min )
    Gauge equivariant neural networks for quantum lattice gauge theories. (arXiv:2012.05232v2 [cond-mat.str-el] UPDATED)
    Gauge symmetries play a key role in physics appearing in areas such as quantum field theories of the fundamental particles and emergent degrees of freedom in quantum materials. Motivated by the desire to efficiently simulate many-body quantum systems with exact local gauge invariance, gauge equivariant neural-network quantum states are introduced, which exactly satisfy the local Hilbert space constraints necessary for the description of quantum lattice gauge theory with Zd gauge group on different geometries. Focusing on the special case of Z2 gauge group on a periodically identified square lattice, the equivariant architecture is analytically shown to contain the loop-gas solution as a special case. Gauge equivariant neural-network quantum states are used in combination with variational quantum Monte Carlo to obtain compact descriptions of the ground state wavefunction for the Z2 theory away from the exactly solvable limit, and to demonstrate the confining/deconfining phase transition of the Wilson loop order parameter.  ( 2 min )
    ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training. (arXiv:2205.06265v1 [cs.LG])
    Negative flips are errors introduced in a classification system when a legacy model is replaced with a new one. Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy using model distillation, or use ensembles, which multiply inference cost prohibitively. We present a method to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model. Our method introduces a generalized distillation objective, Logit Difference Inhibition (LDI), that penalizes changes in the logits between the new and old model, without forcing them to coincide as in ordinary distillation. LDI affords the model flexibility to reduce error rate along with NFR. The method uses a homogeneous ensemble as the reference model for LDI, hence the name Ensemble LDI, or ELODI. The reference model can then be substituted with a single model at inference time. The method leverages the observation that negative flips are typically not close to the decision boundary, but often exhibit large deviations in the distance among their logits, which are reduced by ELODI.  ( 2 min )
    Learning Generalized Policies Without Supervision Using GNNs. (arXiv:2205.06002v1 [cs.AI])
    We consider the problem of learning generalized policies for classical planning domains using graph neural networks from small instances represented in lifted STRIPS. The problem has been considered before but the proposed neural architectures are complex and the results are often mixed. In this work, we use a simple and general GNN architecture and aim at obtaining crisp experimental results and a deeper understanding: either the policy greedy in the learned value function achieves close to 100% generalization over instances larger than those used in training, or the failure must be understood, and possibly fixed, logically. For this, we exploit the relation established between the expressive power of GNNs and the $C_{2}$ fragment of first-order logic (namely, FOL with 2 variables and counting quantifiers). We find for example that domains with general policies that require more expressive features can be solved with GNNs once the states are extended with suitable "derived atoms" encoding role compositions and transitive closures that do not fit into $C_{2}$. The work follows the GNN approach for learning optimal general policies in a supervised fashion (Stahlberg, Bonet, Geffner, 2022); but the learned policies are no longer required to be optimal (which expands the scope, as many planning domains do not have general optimal policies) and are learned without supervision. Interestingly, value-based reinforcement learning methods that aim to produce optimal policies, do not always yield policies that generalize, as the goals of optimality and generality are in conflict in domains where optimal planning is NP-hard.  ( 2 min )
    Learning more skills through optimistic exploration. (arXiv:2107.14226v6 [cs.LG] UPDATED)
    Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latent-conditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agent to explore and master the environment by encouraging each skill (latent) to reliably reach different states. However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective. To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves training an ensemble of discriminators and rewarding the policy for their disagreement. Our objective directly estimates the epistemic uncertainty that comes from the discriminator not having seen enough training examples, thus providing an intrinsic reward more tailored to the true objective compared to pseudocount-based methods (Burda et al., 2019). We call this exploration bonus discriminator disagreement intrinsic reward, or DISDAIN. We demonstrate empirically that DISDAIN improves skill learning both in a tabular grid world (Four Rooms) and the 57 games of the Atari Suite (from pixels). Thus, we encourage researchers to treat pessimism with DISDAIN.  ( 2 min )
    AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. (arXiv:2104.02180v2 [cs.GR] UPDATED)
    Synthesizing graceful and life-like behaviors for physically simulated characters has been a fundamental challenge in computer animation. Data-driven methods that leverage motion tracking are a prominent class of techniques for producing high fidelity motions for a wide range of behaviors. However, the effectiveness of these tracking-based methods often hinges on carefully designed objective functions, and when applied to large and diverse motion datasets, these methods require significant additional machinery to select the appropriate motion for the character to track in a given scenario. In this work, we propose to obviate the need to manually design imitation objectives and mechanisms for motion selection by utilizing a fully automated approach based on adversarial imitation learning. High-level task objectives that the character should perform can be specified by relatively simple reward functions, while the low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips, without any explicit clip selection or sequencing. These motion clips are used to train an adversarial motion prior, which specifies style-rewards for training the character through reinforcement learning (RL). The adversarial RL procedure automatically selects which motion to perform, dynamically interpolating and generalizing from the dataset. Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips. Composition of disparate skills emerges automatically from the motion prior, without requiring a high-level motion planner or other task-specific annotations of the motion clips. We demonstrate the effectiveness of our framework on a diverse cast of complex simulated characters and a challenging suite of motor control tasks.  ( 3 min )
    A Comparative Survey of Deep Active Learning. (arXiv:2203.13450v2 [cs.LG] UPDATED)
    Active Learning (AL) is a set of techniques for reducing labeling cost by sequentially selecting data samples from a large unlabeled data pool for labeling. Meanwhile, Deep Learning (DL) is data-hungry, and the performance of DL models scales monotonically with more training data. Therefore, in recent years, Deep Active Learning (DAL) has risen as feasible solutions for maximizing model performance while minimizing the expensive labeling cost. Abundant methods have sprung up and literature reviews of DAL have been presented before. However, the performance comparison of different branches of DAL methods under various tasks is still insufficient and our work fills this gap. In this paper, we survey and categorize DAL-related work and construct comparative experiments across frequently used datasets and DAL algorithms. Additionally, we explore some factors (e.g., batch size, number of epochs in the training process) that influence the efficacy of DAL, which provides better references for researchers to design their own DAL experiments or carry out DAL-related applications. We construct a DAL toolkit, DeepAL+, by re-implementing many highly-cited DAL-related methods, and it will be released to the public.  ( 2 min )
    Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs. (arXiv:2205.05943v1 [cs.CL])
    We propose a generative model for text generation, which exhibits disentangled latent representations of syntax and semantics. Contrary to previous work, this model does not need syntactic information such as constituency parses, or semantic information such as paraphrase pairs. Our model relies solely on the inductive bias found in attention-based architectures such as Transformers. In the attention of Transformers, keys handle information selection while values specify what information is conveyed. Our model, dubbed QKVAE, uses Attention in its decoder to read latent variables where one latent variable infers keys while another infers values. We run experiments on latent representations and experiments on syntax/semantics transfer which show that QKVAE displays clear signs of disentangled syntax and semantics. We also show that our model displays competitive syntax transfer capabilities when compared to supervised models and that comparable supervised models need a fairly large amount of data (more than 50K samples) to outperform it on both syntactic and semantic transfer. The code for our experiments is publicly available.  ( 2 min )
    Exploiting symmetry in variational quantum machine learning. (arXiv:2205.06217v1 [quant-ph])
    Variational quantum machine learning is an extensively studied application of near-term quantum computers. The success of variational quantum learning models crucially depends on finding a suitable parametrization of the model that encodes an inductive bias relevant to the learning task. However, precious little is known about guiding principles for the construction of suitable parametrizations. In this work, we holistically explore when and how symmetries of the learning problem can be exploited to construct quantum learning models with outcomes invariant under the symmetry of the learning task. Building on tools from representation theory, we show how a standard gateset can be transformed into an equivariant gateset that respects the symmetries of the problem at hand through a process of gate symmetrization. We benchmark the proposed methods on two toy problems that feature a non-trivial symmetry and observe a substantial increase in generalization performance. As our tools can also be applied in a straightforward way to other variational problems with symmetric structure, we show how equivariant gatesets can be used in variational quantum eigensolvers.  ( 2 min )
    Localized Vision-Language Matching for Open-vocabulary Object Detection. (arXiv:2205.06160v1 [cs.CV])
    In this work, we propose an open-world object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-world detection approaches while being data-efficient.  ( 2 min )
    Equivariant quantum circuits for learning on weighted graphs. (arXiv:2205.06109v1 [quant-ph])
    Variational quantum algorithms are the leading candidate for near-term advantage on noisy quantum hardware. When training a parametrized quantum circuit to solve a specific task, the choice of ansatz is one of the most important factors that determines the trainability and performance of the algorithm. Problem-tailored ansatzes have become the standard for tasks in optimization or quantum chemistry, and yield more efficient algorithms with better performance than unstructured approaches. In quantum machine learning (QML), however, the literature on ansatzes that are motivated by the training data structure is scarce. Considering that it is widely known that unstructured ansatzes can become untrainable with increasing system size and circuit depth, it is of key importance to also study problem-tailored circuit architectures in a QML context. In this work, we introduce an ansatz for learning tasks on weighted graphs that respects an important graph symmetry, namely equivariance under node permutations. We evaluate the performance of this ansatz on a complex learning task on weighted graphs, where a ML model is used to implement a heuristic for a combinatorial optimization problem. We analytically study the expressivity of our ansatz at depth one, and numerically compare the performance of our model on instances with up to 20 qubits to ansatzes where the equivariance property is gradually broken. We show that our ansatz outperforms all others even in the small-instance regime. Our results strengthen the notion that symmetry-preserving ansatzes are a key to success in QML and should be an active area of research in order to enable near-term advantages in this field.  ( 2 min )
    Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets. (arXiv:2205.06218v1 [cs.CV])
    This paper performs comprehensive analysis on datasets for occlusion-aware face segmentation, a task that is crucial for many downstream applications. The collection and annotation of such datasets are time-consuming and labor-intensive. Although some efforts have been made in synthetic data generation, the naturalistic aspect of data remains less explored. In our study, we propose two occlusion generation techniques, Naturalistic Occlusion Generation (NatOcc), for producing high-quality naturalistic synthetic occluded faces; and Random Occlusion Generation (RandOcc), a more general synthetic occluded data generation method. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. To facilitate model evaluation, we present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild, featuring both careful alignment preprocessing and an in-the-wild setting for robustness test. We further conduct a comprehensive analysis on a newly introduced segmentation benchmark, offering insights for future exploration.  ( 2 min )
    E-Mail Assistant -- Automation of E-Mail Handling and Management using Robotic Process Automation. (arXiv:2205.05882v1 [cs.LG])
    In this paper, a workflow for designing a bot using Robotic Process Automation (RPA), associated with Artificial Intelligence (AI) that is used for information extraction, classification, etc., is proposed. The bot is equipped with many features that make email handling a stress-free job. It automatically login into the mailbox through secured channels, distinguishes between the useful and not useful emails, classifies the emails into different labels, downloads the attached files, creates different directories, and stores the downloaded files into relevant directories. It moves the not useful emails into the trash. Further, the bot can also be trained to rename the attached files with the names of the sender/applicant in case of a job application for the sake of convenience. The bot is designed and tested using the UiPath tool to improve the performance of the system. The paper also discusses the further possible functionalities that can be added on to the bot.  ( 2 min )
    Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case Study Using Music Audio. (arXiv:2205.05871v1 [cs.SD])
    Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentanglement of the underlying local and global factors. In this paper, we show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables, and is prone to collapse the static latent variable. As a countermeasure, we propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions, which are subsequently employed to regularise the model and facilitate auxiliary objectives to promote disentanglement. The proposed framework is fully unsupervised and robust against the global factor collapse problem across a wide range of model configurations. It also avoids typical solutions such as adversarial training which usually involves laborious parameter tuning, and domain-specific data augmentation. We conduct quantitative and qualitative evaluations to demonstrate its robustness in terms of disentanglement on both artificial and real-world music audio datasets.  ( 2 min )
    Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation. (arXiv:2205.06053v1 [cs.SD])
    This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism. In our previous work, we proposed unified Source-Filter GAN (uSFGAN) for developing a high-fidelity neural vocoder with flexible voice controllability using a unified source-filter neural network architecture. However, the capability of uSFGAN to model the aperiodic source excitation signal is insufficient, and there is still a gap in sound quality between the natural and generated speech. To improve the source excitation modeling and generated sound quality, a new source excitation generation network separately generating periodic and aperiodic components is proposed. The advanced adversarial training procedure of HiFiGAN is also adopted to replace that of Parallel WaveGAN used in the original uSFGAN. Both objective and subjective evaluation results show that the modified uSFGAN significantly improves the sound quality of the basic uSFGAN while maintaining the voice controllability.  ( 2 min )
    Optimal transport weights for causal inference. (arXiv:2109.01991v4 [stat.ME] UPDATED)
    Imbalance in covariate distributions leads to biased estimates of causal effects. Weighting methods attempt to correct this imbalance but rely on specifying models for the treatment assignment mechanism, which is unknown in observational studies. This leaves researchers to choose the proper weighting method and the appropriate covariate functions for these models without knowing the correct combination to achieve distributional balance. In response to these difficulties, we propose a nonparametric generalization of several other weighting schemes found in the literature: Causal Optimal Transport. This new method directly targets distributional balance by minimizing optimal transport distances between treatment and control groups or, more generally, between any source and target population. Our approach is semiparametrically efficient and model-free but can also incorporate moments or any other important functions of covariates that a researcher desires to balance. Moreover, our method can provide nonparametric estimate the conditional mean outcome function and we give rates for the convergence of this estimator. Moreover, we show how this method can provide nonparametric imputations of the missing potential outcomes and give rates of convergence for this estimator. We find that Causal Optimal Transport outperforms competitor methods when both the propensity score and outcome models are misspecified, indicating it is a robust alternative to common weighting methods. Finally, we demonstrate the utility of our method in an external control trial examining the effect of misoprostol versus oxytocin for the treatment of post-partum hemorrhage.  ( 2 min )
    Communicative Subgraph Representation Learning for Multi-Relational Inductive Drug-Gene Interaction Prediction. (arXiv:2205.05957v1 [cs.LG])
    Illuminating the interconnections between drugs and genes is an important topic in drug development and precision medicine. Currently, computational predictions of drug-gene interactions mainly focus on the binding interactions without considering other relation types like agonist, antagonist, etc. In addition, existing methods either heavily rely on high-quality domain features or are intrinsically transductive, which limits the capacity of models to generalize to drugs/genes that lack external information or are unseen during the training process. To address these problems, we propose a novel Communicative Subgraph representation learning for Multi-relational Inductive drug-Gene interactions prediction (CoSMIG), where the predictions of drug-gene relations are made through subgraph patterns, and thus are naturally inductive for unseen drugs/genes without retraining or utilizing external domain features. Moreover, the model strengthened the relations on the drug-gene graph through a communicative message passing mechanism. To evaluate our method, we compiled two new benchmark datasets from DrugBank and DGIdb. The comprehensive experiments on the two datasets showed that our method outperformed state-of-the-art baselines in the transductive scenarios and achieved superior performance in the inductive ones. Further experimental analysis including LINCS experimental validation and literature verification also demonstrated the value of our model.
    Embodied vision for learning object representations. (arXiv:2205.06198v1 [cs.LG])
    Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by mapping successive views of an object onto close-by internal representations. When considering this learning approach as a model of the development of human object recognition, it is important to consider what visual input a toddler would typically observe while interacting with objects. First, human vision is highly foveated, with high resolution only available in the central region of the field of view. Second, objects may be seen against a blurry background due to infants' limited depth of field. Third, during object manipulation a toddler mostly observes close objects filling a large part of the field of view due to their rather short arms. Here, we study how these effects impact the quality of visual representations learnt through time-contrastive learning. To this end, we let a visually embodied agent "play" with objects in different locations of a near photo-realistic flat. During each play session the agent views an object in multiple orientations before turning its body to view another object. The resulting sequence of views feeds a time-contrastive learning algorithm. Our results show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments. We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions. We conclude that the embodied nature of visual learning may be crucial for understanding the development of human object perception.
    Smooth-Reduce: Leveraging Patches for Improved Certified Robustness. (arXiv:2205.06154v1 [cs.LG])
    Randomized smoothing (RS) has been shown to be a fast, scalable technique for certifying the robustness of deep neural network classifiers. However, methods based on RS require augmenting data with large amounts of noise, which leads to significant drops in accuracy. We propose a training-free, modified smoothing approach, Smooth-Reduce, that leverages patching and aggregation to provide improved classifier certificates. Our algorithm classifies overlapping patches extracted from an input image, and aggregates the predicted logits to certify a larger radius around the input. We study two aggregation schemes -- max and mean -- and show that both approaches provide better certificates in terms of certified accuracy, average certified radii and abstention rates as compared to concurrent approaches. We also provide theoretical guarantees for such certificates, and empirically show significant improvements over other randomized smoothing methods that require expensive retraining. Further, we extend our approach to videos and provide meaningful certificates for video classifiers. A project page can be found at https://nyu-dice-lab.github.io/SmoothReduce/  ( 2 min )
    GPN: A Joint Structural Learning Framework for Graph Neural Networks. (arXiv:2205.05964v1 [cs.LG])
    Graph neural networks (GNNs) have been applied into a variety of graph tasks. Most existing work of GNNs is based on the assumption that the given graph data is optimal, while it is inevitable that there exists missing or incomplete edges in the graph data for training, leading to degraded performance. In this paper, we propose Generative Predictive Network (GPN), a GNN-based joint learning framework that simultaneously learns the graph structure and the downstream task. Specifically, we develop a bilevel optimization framework for this joint learning task, in which the upper optimization (generator) and the lower optimization (predictor) are both instantiated with GNNs. To the best of our knowledge, our method is the first GNN-based bilevel optimization framework for resolving this task. Through extensive experiments, our method outperforms a wide range of baselines using benchmark datasets.
    Graph Masked Autoencoders with Transformers. (arXiv:2202.08391v2 [cs.LG] UPDATED)
    Recently, transformers have shown promising performance in learning graph representations. However, there are still some challenges when applying transformers to real-world scenarios due to the fact that deep transformers are hard to train from scratch and the quadratic memory consumption w.r.t. the number of nodes. In this paper, we propose Graph Masked Autoencoders (GMAEs), a self-supervised transformer-based model for learning graph representations. To address the above two challenges, we adopt the masking mechanism and the asymmetric encoder-decoder design. Specifically, GMAE takes partially masked graphs as input, and reconstructs the features of the masked nodes. The encoder and decoder are asymmetric, where the encoder is a deep transformer and the decoder is a shallow transformer. The masking mechanism and the asymmetric design make GMAE a memory-efficient model compared with conventional transformers. We show that, when serving as a conventional self-supervised graph representation model, GMAE achieves state-of-the-art performance on both the graph classification task and the node classification task under common downstream evaluation protocols. We also show that, compared with training in an end-to-end manner from scratch, we can achieve comparable performance after pre-training and fine-tuning using GMAE while simplifying the training process.
    A non-asymptotic approach for model selection via penalization in high-dimensional mixture of experts models. (arXiv:2104.02640v2 [math.ST] UPDATED)
    Mixture of experts (MoE) are a popular class of statistical and machine learning models that have gained attention over the years due to their flexibility and efficiency. In this work, we consider Gaussian-gated localized MoE (GLoME) and block-diagonal covariance localized MoE (BLoME) regression models to present nonlinear relationships in heterogeneous data with potential hidden graph-structured interactions between high-dimensional predictors. These models pose difficult statistical estimation and model selection questions, both from a computational and theoretical perspective. This paper is devoted to the study of the problem of model selection among a collection of GLoME or BLoME models characterized by the number of mixture components, the complexity of Gaussian mean experts, and the hidden block-diagonal structures of the covariance matrices, in a penalized maximum likelihood estimation framework. In particular, we establish non-asymptotic risk bounds that take the form of weak oracle inequalities, provided that lower bounds for the penalties hold. The good empirical behavior of our models is then demonstrated on synthetic and real datasets.
    Robustness and Reliability When Training With Noisy Labels. (arXiv:2110.03321v2 [stat.ML] UPDATED)
    Labelling of data for supervised learning can be costly and time-consuming and the risk of incorporating label noise in large data sets is imminent. When training a flexible discriminative model using a strictly proper loss, such noise will inevitably shift the solution towards the conditional distribution over noisy labels. Nevertheless, while deep neural networks have proven capable of fitting random labels, regularisation and the use of robust loss functions empirically mitigate the effects of label noise. However, such observations concern robustness in accuracy, which is insufficient if reliable uncertainty quantification is critical. We demonstrate this by analysing the properties of the conditional distribution over noisy labels for an input-dependent noise model. In addition, we evaluate the set of robust loss functions characterised by noise-insensitive, asymptotic risk minimisers. We find that strictly proper and robust loss functions both offer asymptotic robustness in accuracy, but neither guarantee that the final model is calibrated. Moreover, even with robust loss functions, overfitting is an issue in practice. With these results, we aim to explain observed robustness of common training practices, such as early stopping, to label noise. In addition, we aim to encourage the development of new noise-robust algorithms that not only preserve accuracy but that also ensure reliability.
    A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation. (arXiv:2203.09893v2 [cs.SD] UPDATED)
    Automatic Music Transcription (AMT) has been recognized as a key enabling technology with a wide range of applications. Given the task's complexity, best results have typically been reported for systems focusing on specific settings, e.g. instrument-specific systems tend to yield improved results over instrument-agnostic methods. Similarly, higher accuracy can be obtained when only estimating frame-wise $f_0$ values and neglecting the harder note event detection. Despite their high accuracy, such specialized systems often cannot be deployed in the real-world. Storage and network constraints prohibit the use of multiple specialized models, while memory and run-time constraints limit their complexity. In this paper, we propose a lightweight neural network for musical instrument transcription, which supports polyphonic outputs and generalizes to a wide variety of instruments (including vocals). Our model is trained to jointly predict frame-wise onsets, multipitch and note activations, and we experimentally show that this multi-output structure improves the resulting frame-level note accuracy. Despite its simplicity, benchmark results show our system's note estimation to be substantially better than a comparable baseline, and its frame-level accuracy to be only marginally below those of specialized state-of-the-art AMT systems. With this work we hope to encourage the community to further investigate low-resource, instrument-agnostic AMT systems.
    Virtual twins of nonlinear vibrating multiphysics microstructures: physics-based versus deep learning-based approaches. (arXiv:2205.05928v1 [math.DS])
    Micro-Electro-Mechanical-Systems are complex structures, often involving nonlinearites of geometric and multiphysics nature, that are used as sensors and actuators in countless applications. Starting from full-order representations, we apply deep learning techniques to generate accurate, efficient and real-time reduced order models to be used as virtual twin for the simulation and optimization of higher-level complex systems. We extensively test the reliability of the proposed procedures on micromirrors, arches and gyroscopes, also displaying intricate dynamical evolutions like internal resonances. In particular, we discuss the accuracy of the deep learning technique and its ability to replicate and converge to the invariant manifolds predicted using the recently developed direct parametrization approach that allows extracting the nonlinear normal modes of large finite element models. Finally, by addressing an electromechanical gyroscope, we show that the non-intrusive deep learning approach generalizes easily to complex multiphysics problems
    Pseudo-Label Guided Multi-Contrast Generalization for Non-Contrast Organ-Aware Segmentation. (arXiv:2205.05898v1 [eess.IV])
    Non-contrast computed tomography (NCCT) is commonly acquired for lung cancer screening, assessment of general abdominal pain or suspected renal stones, trauma evaluation, and many other indications. However, the absence of contrast limits distinguishing organ in-between boundaries. In this paper, we propose a novel unsupervised approach that leverages pairwise contrast-enhanced CT (CECT) context to compute non-contrast segmentation without ground-truth label. Unlike generative adversarial approaches, we compute the pairwise morphological context with CECT to provide teacher guidance instead of generating fake anatomical context. Additionally, we further augment the intensity correlations in 'organ-specific' settings and increase the sensitivity to organ-aware boundary. We validate our approach on multi-organ segmentation with paired non-contrast & contrast-enhanced CT scans using five-fold cross-validation. Full external validations are performed on an independent non-contrast cohort for aorta segmentation. Compared with current abdominal organs segmentation state-of-the-art in fully supervised setting, our proposed pipeline achieves a significantly higher Dice by 3.98% (internal multi-organ annotated), and 8.00% (external aorta annotated) for abdominal organs segmentation. The code and pretrained models are publicly available at https://github.com/MASILab/ContrastMix.  ( 2 min )
    SIBILA: High-performance computing and interpretable machine learning join efforts toward personalised medicine in a novel decision-making tool. (arXiv:2205.06234v1 [cs.LG])
    Background and Objectives: Personalised medicine remains a major challenge for scientists. The rapid growth of Machine learning and Deep learning has made it a feasible alternative for predicting the most appropriate therapy for individual patients. However, the lack of interpretation of their results and high computational requirements make many reluctant to use these methods. Methods: Several Machine learning and Deep learning models have been implemented into a single software tool, SIBILA. Once the models are trained, SIBILA applies a range of interpretability methods to identify the input features that each model considered the most important to predict. In addition, all the features obtained are put in common to estimate the global attribution of each variable to the predictions. To facilitate its use by non-experts, SIBILA is also available to all users free of charge as a web server at https://bio-hpc.ucam.edu/sibila/. Results: SIBILA has been applied to three case studies to show its accuracy and efficiency in classification and regression problems. The first two cases proved that SIBILA can make accurate predictions even on uncleaned datasets. The last case demonstrates that SIBILA can be applied to medical contexts with real data. Conclusion: With the aim of becoming a powerful decision-making tool for clinicians, SIBILA has been developed. SIBILA is a novel software tool that leverages interpretable machine learning to make accurate predictions and explain how models made those decisions. SIBILA can be run on high-performance computing platforms, drastically reducing computing times.  ( 2 min )
    Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft. (arXiv:2112.03482v2 [cs.LG] UPDATED)
    Real-world tasks of interest are generally poorly defined by human-readable descriptions and have no pre-defined reward signals unless it is defined by a human designer. Conversely, data-driven algorithms are often designed to solve a specific, narrowly defined, task with performance metrics that drives the agent's learning. In this work, we present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft, which challenged participants to use human data to solve four tasks defined only by a natural language description and no reward function. Our approach uses the available human demonstration data to train an imitation learning policy for navigation and additional human feedback to train an image classifier. These modules, combined with an estimated odometry map, become a powerful state-machine designed to utilize human knowledge in a natural hierarchical paradigm. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators. Codebase is available at https://github.com/viniciusguigo/kairos_minerl_basalt.  ( 2 min )
    An $l_1$-oracle inequality for the Lasso in mixture-of-experts regression models. (arXiv:2009.10622v3 [math.ST] UPDATED)
    Mixture-of-experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of available statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes when compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size, is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results for dealing with the curse of dimensionality, for both the statistical estimation and feature selection problems. We consider the finite MoE model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its $l_1$-regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an $l_1$-oracle inequality satisfied by the Lasso estimator according to the Kullback--Leibler loss.
    Social learning via actions in bandit environments. (arXiv:2205.06107v1 [econ.TH])
    I study a game of strategic exploration with private payoffs and public actions in a Bayesian bandit setting. In particular, I look at cascade equilibria, in which agents switch over time from the risky action to the riskless action only when they become sufficiently pessimistic. I show that these equilibria exist under some conditions and establish their salient properties. Individual exploration in these equilibria can be more or less than the single-agent level depending on whether the agents start out with a common prior or not, but the most optimistic agent always underexplores. I also show that allowing the agents to write enforceable ex-ante contracts will lead to the most ex-ante optimistic agent to buy all payoff streams, providing an explanation to the buying out of smaller start-ups by more established firms.  ( 2 min )
    The Implicit Bias of Benign Overfitting. (arXiv:2201.11489v3 [cs.LG] UPDATED)
    The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while attaining low expected loss, has received much attention in recent years, but still remains not fully understood beyond well-specified linear regression setups. In this paper, we provide several new results on when one can or cannot expect benign overfitting to occur, for both regression and classification tasks. We consider a prototypical and rather generic data model for benign overfitting of linear predictors, where an arbitrary input distribution of some fixed dimension $k$ is concatenated with a high-dimensional distribution. For linear regression which is not necessarily well-specified, we show that the minimum-norm interpolating predictor (that standard training methods converge to) is biased towards an inconsistent solution in general, hence benign overfitting will generally not occur. Moreover, we show how this can be extended beyond standard linear regression, by an argument proving how the existence of benign overfitting on some regression problems precludes its existence on other regression problems. We then turn to classification problems, and show that the situation there is much more favorable. Specifically, we prove that the max-margin predictor (to which standard training methods are known to converge in direction) is asymptotically biased towards minimizing a weighted squared hinge loss. This allows us to reduce the question of benign overfitting in classification to the simpler question of whether this loss is a good surrogate for the misclassification error, and use it to show benign overfitting in some new settings.
    Healthy Twitter discussions? Time will tell. (arXiv:2203.11261v2 [cs.SI] UPDATED)
    Studying misinformation and how to deal with unhealthy behaviours within online discussions has recently become an important field of research within social studies. With the rapid development of social media, and the increasing amount of available information and sources, rigorous manual analysis of such discourses has become unfeasible. Many approaches tackle the issue by studying the semantic and syntactic properties of discussions following a supervised approach, for example using natural language processing on a dataset labeled for abusive, fake or bot-generated content. Solutions based on the existence of a ground truth are limited to those domains which may have ground truth. However, within the context of misinformation, it may be difficult or even impossible to assign labels to instances. In this context, we consider the use of temporal dynamic patterns as an indicator of discussion health. Working in a domain for which ground truth was unavailable at the time (early COVID-19 pandemic discussions) we explore the characterization of discussions based on the the volume and time of contributions. First we explore the types of discussions in an unsupervised manner, and then characterize these types using the concept of ephemerality, which we formalize. In the end, we discuss the potential use of our ephemerality definition for labeling online discourses based on how desirable, healthy and constructive they are.
    Neural Network-based OFDM Receiver for Resource Constrained IoT Devices. (arXiv:2205.06159v1 [eess.SP])
    Orthogonal Frequency Division Multiplexing (OFDM)-based waveforms are used for communication links in many current and emerging Internet of Things (IoT) applications, including the latest WiFi standards. For such OFDM-based transceivers, many core physical layer functions related to channel estimation, demapping, and decoding are implemented for specific choices of channel types and modulation schemes, among others. To decouple hard-wired choices from the receiver chain and thereby enhance the flexibility of IoT deployment in many novel scenarios without changing the underlying hardware, we explore a novel, modular Machine Learning (ML)-based receiver chain design. Here, ML blocks replace the individual processing blocks of an OFDM receiver, and we specifically describe this swapping for the legacy channel estimation, symbol demapping, and decoding blocks with Neural Networks (NNs). A unique aspect of this modular design is providing flexible allocation of processing functions to the legacy or ML blocks, allowing them to interchangeably coexist. Furthermore, we study the implementation cost-benefits of the proposed NNs in resource-constrained IoT devices through pruning and quantization, as well as emulation of these compressed NNs within Field Programmable Gate Arrays (FPGAs). Our evaluations demonstrate that the proposed modular NN-based receiver improves bit error rate of the traditional non-ML receiver by averagely 61% and 10% for the simulated and over-the-air datasets, respectively. We further show complexity-performance tradeoffs by presenting computational complexity comparisons between the traditional algorithms and the proposed compressed NNs.
    Improved Sample Complexity Bounds for Branch-and-Cut. (arXiv:2111.11207v2 [cs.LG] UPDATED)
    Branch-and-cut is the most widely used algorithm for solving integer programs, employed by commercial solvers like CPLEX and Gurobi. Branch-and-cut has a wide variety of tunable parameters that have a huge impact on the size of the search tree that it builds, but are challenging to tune by hand. An increasingly popular approach is to use machine learning to tune these parameters: using a training set of integer programs from the application domain at hand, the goal is to find a configuration with strong predicted performance on future, unseen integer programs from the same domain. If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs. In this paper, we prove sample complexity guarantees for this procedure, which bound how large the training set should be to ensure that for any configuration, its average performance over the training set is close to its expected future performance. Our guarantees apply to parameters that control the most important aspects of branch-and-cut: node selection, branching constraint selection, and cutting plane selection, and are sharper and more general than those found in prior research.
    AiSocrates: Towards Answering Ethical Quandary Questions. (arXiv:2205.05989v1 [cs.CL])
    Considerable advancements have been made in various NLP tasks based on the impressive power of large pre-trained language models (LLMs). These results have inspired efforts to understand the limits of LLMs so as to evaluate how far we are from achieving human level general natural language understanding. In this work, we challenge the capability of LLMs with the new task of Ethical Quandary Generative Question Answering. Ethical quandary questions are more challenging to address because multiple conflicting answers may exist to a single quandary. We propose a system, AiSocrates, that provides an answer with a deliberative exchange of different perspectives to an ethical quandary, in the approach of Socratic philosophy, instead of providing a closed answer like an oracle. AiSocrates searches for different ethical principles applicable to the ethical quandary and generates an answer conditioned on the chosen principles through prompt-based few-shot learning. We also address safety concerns by providing a human controllability option in choosing ethical principles. We show that AiSocrates generates promising answers to ethical quandary questions with multiple perspectives, 6.92% more often than answers written by human philosophers by one measure, but the system still needs improvement to match the coherence of human philosophers fully. We argue that AiSocrates is a promising step toward developing an NLP system that incorporates human values explicitly by prompt instructions. We are releasing the code for research purposes.
    Open-vocabulary Object Detection via Vision and Language Knowledge Distillation. (arXiv:2104.13921v3 [cs.CV] UPDATED)
    We aim at advancing open-vocabulary object detection, which detects objects described by arbitrary text inputs. The fundamental challenge is the availability of training data. It is costly to further scale up the number of classes contained in existing object detection datasets. To overcome this challenge, we propose ViLD, a training method via Vision and Language knowledge Distillation. Our method distills the knowledge from a pretrained open-vocabulary image classification model (teacher) into a two-stage detector (student). Specifically, we use the teacher model to encode category texts and image regions of object proposals. Then we train a student detector, whose region embeddings of detected boxes are aligned with the text and image embeddings inferred by the teacher. We benchmark on LVIS by holding out all rare categories as novel categories that are not seen during training. ViLD obtains 16.1 mask AP$_r$ with a ResNet-50 backbone, even outperforming the supervised counterpart by 3.8. When trained with a stronger teacher model ALIGN, ViLD achieves 26.3 AP$_r$. The model can directly transfer to other datasets without finetuning, achieving 72.2 AP$_{50}$ on PASCAL VOC, 36.6 AP on COCO and 11.8 AP on Objects365. On COCO, ViLD outperforms the previous state-of-the-art by 4.8 on novel AP and 11.4 on overall AP. Code and demo are open-sourced at https://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/vild.
    Fair NLP Models with Differentially Private Text Encoders. (arXiv:2205.06135v1 [cs.CL])
    Encoded text representations often capture sensitive attributes about individuals (e.g., race or gender), which raise privacy concerns and can make downstream models unfair to certain groups. In this work, we propose FEDERATE, an approach that combines ideas from differential privacy and adversarial training to learn private text representations which also induces fairer models. We empirically evaluate the trade-off between the privacy of the representations and the fairness and accuracy of the downstream model on four NLP datasets. Our results show that FEDERATE consistently improves upon previous methods, and thus suggest that privacy and fairness can positively reinforce each other.  ( 2 min )
    Energy-Based Learning for Cooperative Games, with Applications to Valuation Problems in Machine Learning. (arXiv:2106.02938v4 [cs.LG] UPDATED)
    Valuation problems, such as feature interpretation, data valuation and model valuation for ensembles, become increasingly more important in many machine learning applications. Such problems are commonly solved by well-known game-theoretic criteria, such as Shapley value or Banzhaf value. In this work, we present a novel energy-based treatment for cooperative games, with a theoretical justification by the maximum entropy framework. Surprisingly, by conducting variational inference of the energy-based model, we recover various game-theoretic valuation criteria through conducting one-step fixed point iteration for maximizing the mean-field ELBO objective. This observation also verifies the rationality of existing criteria, as they are all attempting to decouple the correlations among the players through the mean-field approach. By running fixed point iteration for multiple steps, we achieve a trajectory of the valuations, among which we define the valuation with the best conceivable decoupling error as the Variational Index. We prove that under uniform initializations, these variational valuations all satisfy a set of game-theoretic axioms. We experimentally demonstrate that the proposed Variational Index enjoys lower decoupling error and better valuation performance on certain synthetic and real-world valuation problems.  ( 2 min )
    Generating Fair Universal Representations using Adversarial Models. (arXiv:1910.00411v7 [cs.LG] UPDATED)
    We present a data-driven framework for learning fair universal representations (FUR) that guarantee statistical fairness for any learning task that may not be known a priori. Our framework leverages recent advances in adversarial learning to allow a data holder to learn representations in which a set of sensitive attributes are decoupled from the rest of the dataset. We formulate this as a constrained minimax game between an encoder and an adversary where the constraint ensures a measure of usefulness (utility) of the representation. The resulting problem is that of censoring, i.e., finding a representation that is least informative about the sensitive attributes given a utility constraint. For appropriately chosen adversarial loss functions, our censoring framework precisely clarifies the optimal adversarial strategy against strong information-theoretic adversaries; it also achieves the fairness measure of demographic parity for the resulting constrained representations. We evaluate the performance of our proposed framework on both synthetic and publicly available datasets. For these datasets, we use two tradeoff measures: censoring vs. representation fidelity and fairness vs. utility for downstream tasks, to amply demonstrate that multiple sensitive features can be effectively censored even as the resulting fair representations ensure accuracy for multiple downstream tasks.  ( 2 min )
    So Cloze yet so Far: N400 Amplitude is Better Predicted by Distributional Information than Human Predictability Judgements. (arXiv:2109.01226v2 [cs.CL] UPDATED)
    More predictable words are easier to process - they are read faster and elicit smaller neural signals associated with processing difficulty, most notably, the N400 component of the event-related brain potential. Thus, it has been argued that prediction of upcoming words is a key component of language comprehension, and that studying the amplitude of the N400 is a valuable way to investigate the predictions we make. In this study, we investigate whether the linguistic predictions of computational language models or humans better reflect the way in which natural language stimuli modulate the amplitude of the N400. One important difference in the linguistic predictions of humans versus computational language models is that while language models base their predictions exclusively on the preceding linguistic context, humans may rely on other factors. We find that the predictions of three top-of-the-line contemporary language models - GPT-3, RoBERTa, and ALBERT - match the N400 more closely than human predictions. This suggests that the predictive processes underlying the N400 may be more sensitive to the surface-level statistics of language than previously thought.  ( 2 min )
    Efficient Federated Learning for AIoT Applications Using Knowledge Distillation. (arXiv:2111.14347v2 [cs.LG] UPDATED)
    As a promising distributed machine learning paradigm, Federated Learning (FL) trains a central model with decentralized data without compromising user privacy, which has made it widely used by Artificial Intelligence Internet of Things (AIoT) applications. However, the traditional FL suffers from model inaccuracy since it trains local models using hard labels of data and ignores useful information of incorrect predictions with small probabilities. Although various solutions try to tackle the bottleneck of the traditional FL, most of them introduce significant communication and memory overhead, making the deployment of large-scale AIoT devices a great challenge. To address the above problem, this paper presents a novel Distillation-based Federated Learning (DFL) architecture that enables efficient and accurate FL for AIoT applications. Inspired by Knowledge Distillation (KD) that can increase the model accuracy, our approach adds the soft targets used by KD to the FL model training, which occupies negligible network resources. The soft targets are generated by local sample predictions of each AIoT device after each round of local training and used for the next round of model training. During the local training of DFL, both soft targets and hard labels are used as approximation objectives of model predictions to improve model accuracy by supplementing the knowledge of soft targets. To further improve the performance of our DFL model, we design a dynamic adjustment strategy for tuning the ratio of two loss functions used in KD, which can maximize the use of both soft targets and hard labels. Comprehensive experimental results on well-known benchmarks show that our approach can significantly improve the model accuracy of FL with both Independent and Identically Distributed (IID) and non-IID data.  ( 2 min )
    Training Uncertainty-Aware Classifiers with Conformalized Deep Learning. (arXiv:2205.05878v1 [stat.ML])
    Deep neural networks are powerful tools to detect hidden patterns in data and leverage them to make predictions, but they are not designed to understand uncertainty and estimate reliable probabilities. In particular, they tend to be overconfident. We address this problem by developing a novel training algorithm that can lead to more dependable uncertainty estimates, without sacrificing predictive power. The idea is to mitigate overconfidence by minimizing a loss function, inspired by advances in conformal inference, that quantifies model uncertainty by carefully leveraging hold-out data. Experiments with synthetic and real data demonstrate this method leads to smaller conformal prediction sets with higher conditional coverage, after exact calibration with hold-out data, compared to state-of-the-art alternatives.  ( 2 min )
    Performing Video Frame Prediction of Microbial Growth with a Recurrent Neural Network. (arXiv:2205.05810v1 [cs.LG])
    A Recurrent Neural Network (RNN) was used to perform video frame prediction of microbial growth for a population of two mutants of Pseudomonas aeruginosa. The RNN was trained on videos of 20 frames that were acquired using fluorescence microscopy and microfluidics. The network predicted the last 10 frames of each video, and the accuracy's of the predictions was assessed by comparing raw images, population curves, and the number and size of individual colonies. Overall, we found the predictions to be accurate using this approach. The implications this result has on designing autonomous experiments in microbiology, and the steps that can be taken to make the predictions even more accurate, are discussed.  ( 2 min )
    Subgroup discovery of Parkinson's Disease by utilizing a multi-modal smart device system. (arXiv:2205.05961v1 [cs.LG])
    In recent years, sensors from smart consumer devices have shown great diagnostic potential in movement disorders. In this context, data modalities such as electronic questionnaires, hand movement and voice captures have successfully captured biomarkers and allowed discrimination between Parkinson's disease (PD) and healthy controls (HC) or differential diagnosis (DD). However, to the best of our knowledge, a comprehensive evaluation of assessments with a multi-modal smart device system has still been lacking. In a prospective study exploring PD, we used smartwatches and smartphones to collect multi-modal data from 504 participants, including PD patients, DD and HC. This study aims to assess the effect of multi-modal vs. single-modal data on PD vs. HC and PD vs. DD classification, as well as on PD group clustering for subgroup identification. We were able to show that by combining various modalities, classification accuracy improved and further PD clusters were discovered.  ( 2 min )
    Topologically-Aware Deformation Fields for Single-View 3D Reconstruction. (arXiv:2205.06267v1 [cs.CV])
    We present a new framework for learning 3D object shapes and dense cross-object 3D correspondences from just an unaligned category-specific image collection. The 3D shapes are generated implicitly as deformations to a category-specific signed distance field and are learned in an unsupervised manner solely from unaligned image collections without any 3D supervision. Generally, image collections on the internet contain several intra-category geometric and topological variations, for example, different chairs can have different topologies, which makes the task of joint shape and correspondence estimation much more challenging. Because of this, prior works either focus on learning each 3D object shape individually without modeling cross-instance correspondences or perform joint shape and correspondence estimation on categories with minimal intra-category topological variations. We overcome these restrictions by learning a topologically-aware implicit deformation field that maps a 3D point in the object space to a higher dimensional point in the category-specific canonical space. At inference time, given a single image, we reconstruct the underlying 3D shape by first implicitly deforming each 3D point in the object space to the learned category-specific canonical space using the topologically-aware deformation field and then reconstructing the 3D shape as a canonical signed distance field. Both canonical shape and deformation field are learned end-to-end in an inverse-graphics fashion using a learned recurrent ray marcher (SRN) as a differentiable rendering module. Our approach, dubbed TARS, achieves state-of-the-art reconstruction fidelity on several datasets: ShapeNet, Pascal3D+, CUB, and Pix3D chairs. Result videos and code at https://shivamduggal4.github.io/tars-3D/  ( 2 min )
    Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers. (arXiv:2205.05055v2 [cs.AI] CROSS LISTED)
    Large transformer-based language models are able to perform few-shot learning (also known as in-context learning), without having been explicitly trained for it. We hypothesized that specific distributional properties of natural language might drive this emergent phenomenon, as these characteristics might lead to a kind of interpolation between few-shot meta-training (designed to elicit rapid few-shot learning) and standard supervised training (designed to elicit gradual in-weights learning). We also hypothesized that these distributional properties could lead to emergent few-shot learning in domains outside of language. Inspired by this idea, we ran a series of experiments on a standard image-based few-shot dataset. We discovered that a number of data properties did indeed promote the emergence of few-shot learning in transformer models. All of these properties are present in natural language -- burstiness, long-tailedness, and many-to-one or one-to-many label mappings. The data influenced whether models were biased towards either few-shot learning vs. memorizing information in their weights; models could generally perform well at only one or the other. However, we discovered that an additional distributional property could allow the two capabilities to co-exist in the same model -- a skewed, Zipfian distribution over classes -- which occurs in language as well. Notably, training data that could elicit few-shot learning in transformers were unable to elicit few-shot learning in recurrent models. In sum, we find that few-shot learning emerges only from applying the right architecture to the right data distribution; neither component is sufficient on its own.  ( 2 min )
    Automatic Segmentation of Head and Neck Tumor: How Powerful Transformers Are?. (arXiv:2201.06251v2 [eess.IV] UPDATED)
    Cancer is one of the leading causes of death worldwide, and head and neck (H&N) cancer is amongst the most prevalent types. Positron emission tomography and computed tomography are used to detect, segment and quantify the tumor region. Clinically, tumor segmentation is extensively time-consuming and prone to error. Machine learning, and deep learning in particular, can assist to automate this process, yielding results as accurate as the results of a clinician. In this paper, we investigate a vision transformer-based method to automatically delineate H&N tumor, and compare its results to leading convolutional neural network (CNN)-based models. We use multi-modal data from CT and PET scans to perform the segmentation task. We show that a solution with a transformer-based model has the potential to achieve comparable results to CNN-based ones. With cross validation, the model achieves a mean dice similarity coefficient (DSC) of 0.736, mean precision of 0.766 and mean recall of 0.766. This is only 0.021 less than the 2020 competition winning model (cross validated in-house) in terms of the DSC score. On the testing set, the model performs similarly, with DSC of 0.736, precision of 0.773, and recall of 0.760, which is only 0.023 lower in DSC than the 2020 competition winning model. This work shows that cancer segmentation via transformer-based models is a promising research area to further explore.  ( 2 min )
    kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval. (arXiv:2205.06205v1 [cs.IR])
    Candidate generation is the first stage in recommendation systems, where a light-weight system is used to retrieve potentially relevant items for an input user. These candidate items are then ranked and pruned in later stages of recommender systems using a more complex ranking model. Since candidate generation is the top of the recommendation funnel, it is important to retrieve a high-recall candidate set to feed into downstream ranking models. A common approach for candidate generation is to leverage approximate nearest neighbor (ANN) search from a single dense query embedding; however, this approach this can yield a low-diversity result set with many near duplicates. As users often have multiple interests, candidate retrieval should ideally return a diverse set of candidates reflective of the user's multiple interests. To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval. kNN-Embed represents each user as a smoothed mixture over learned item clusters that represent distinct `interests' of the user. By querying each of a user's mixture component in proportion to their mixture weights, we retrieve a high-diversity set of candidates reflecting elements from each of a user's interests. We experimentally compare kNN-Embed to standard ANN candidate retrieval, and show significant improvements in overall recall and improved diversity across three datasets. Accompanying this work, we open source a large Twitter follow-graph dataset, to spur further research in graph-mining and representation learning for recommender systems.  ( 2 min )
    A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection. (arXiv:2205.05684v1 [eess.AS])
    Audio-visual automatic speech recognition is a promising approach to robust ASR under noisy conditions. However, up until recently it had been traditionally studied in isolation assuming the video of a single speaking face matches the audio, and selecting the active speaker at inference time when multiple people are on screen was put aside as a separate problem. As an alternative, recent work has proposed to address the two problems simultaneously with an attention mechanism, baking the speaker selection problem directly into a fully differentiable model. One interesting finding was that the attention indirectly learns the association between the audio and the speaking face even though this correspondence is never explicitly provided at training time. In the present work we further investigate this connection and examine the interplay between the two problems. With experiments involving over 50 thousand hours of public YouTube videos as training data, we first evaluate the accuracy of the attention layer on an active speaker selection task. Secondly, we show under closer scrutiny that an end-to-end model performs at least as well as a considerably larger two-step system that utilizes a hard decision boundary under various noise conditions and number of parallel face tracks.  ( 2 min )
    Image Segmentation with Topological Priors. (arXiv:2205.06197v1 [cs.CV])
    Solving segmentation tasks with topological priors proved to make fewer errors in fine-scale structures. In this work, we use topological priors both before and during the deep neural network training procedure. We compared the results of the two approaches with simple segmentation on various accuracy metrics and the Betti number error, which is directly related to topological correctness, and discovered that incorporating topological information into the classical UNet model performed significantly better. We conducted experiments on the ISBI EM segmentation dataset.  ( 2 min )
    Fighting Money Laundering with Statistics and Machine Learning: An Introduction and Review. (arXiv:2201.04207v3 [stat.ML] UPDATED)
    Money laundering is a profound global problem. Nonetheless, there is little statistical and machine learning research on the topic. In this paper, we focus on anti-money laundering in banks. To help organize existing research, we propose a unifying terminology and provide a review of the literature. This is structured around two central tasks: (i) client risk profiling and (ii) suspicious behavior flagging. We find that client risk profiling is characterized by diagnostics, i.e., efforts to find and explain risk factors. Suspicious behavior flagging, on the other hand, is characterized by non-disclosed features and hand-crafted risk indices. Finally, we discuss directions for future research. One major challenge is a lack of public data sets. This may, potentially, be addressed by synthetic data generation. Other possible research directions include semi-supervised and deep learning, interpretability, and fairness of the results.  ( 2 min )
    SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients. (arXiv:2106.08208v10 [math.OC] UPDATED)
    Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using some specific adaptive learning rates. Thus, it is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrate the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our SUPER-ADAM algorithm can achieve the best known gradient (i.e., stochastic first-order oracle (SFO)) complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms. Code is available at https://github.com/LIJUNYI95/SuperAdam  ( 2 min )
    One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code. (arXiv:2205.06126v1 [cs.CL])
    People perceive the world with multiple senses (e.g., through hearing sounds, reading words and seeing objects). However, most existing AI systems only process an individual modality. This paper presents an approach that excels at handling multiple modalities of information with a single model. In our "{SkillNet}" model, different parts of the parameters are specialized for processing different modalities. Unlike traditional dense models that always activate all the model parameters, our model sparsely activates parts of the parameters whose skills are relevant to the task. Such model design enables SkillNet to learn skills in a more interpretable way. We develop our model for five modalities including text, image, sound, video and code. Results show that, SkillNet performs comparably to five modality-specific fine-tuned models. Moreover, our model supports self-supervised pretraining with the same sparsely activated way, resulting in better initialized parameters for different modalities. We find that pretraining significantly improves the performance of SkillNet on five modalities, on par with or even better than baselines with modality-specific pretraining. On the task of Chinese text-to-image retrieval, our final system achieves higher accuracy than existing leading systems including Wukong{ViT-B} and Wenlan 2.0 while using less number of activated parameters.  ( 2 min )
    Controlling chaotic itinerancy in laser dynamics for reinforcement learning. (arXiv:2205.05987v1 [physics.optics])
    Photonic artificial intelligence has attracted considerable interest in accelerating machine learning; however, the unique optical properties have not been fully utilized for achieving higher-order functionalities. Chaotic itinerancy, with its spontaneous transient dynamics among multiple quasi-attractors, can be employed to realize brain-like functionalities. In this paper, we propose a method for controlling the chaotic itinerancy in a multi-mode semiconductor laser to solve a machine learning task, known as the multi-armed bandit problem, which is fundamental to reinforcement learning. The proposed method utilizes ultrafast chaotic itinerant motion in mode competition dynamics controlled via optical injection. We found that the exploration mechanism is completely different from a conventional searching algorithm and is highly scalable, outperforming the conventional approaches for large-scale bandit problems. This study paves the way to utilize chaotic itinerancy for effectively solving complex machine learning tasks as photonic hardware accelerators.  ( 2 min )
    Algebraic Machine Learning with an Application to Chemistry. (arXiv:2205.05795v1 [math.AG])
    As data used in scientific application become more complex, studying their geometry and topology has become an increasingly prevalent part of the data analysis process. This can be seen for example with the growing interest in topological tools such as persistent homology. However, on the one hand, topological tools are inherently limited to providing only coarse information about the underlying space of the data. On the other hand, more geometric approaches rely predominately on the manifold hypothesis, which asserts that the underlying space is a smooth manifold. This assumption fails for many physical models where the underlying space contains singularities. In this paper we develop a machine learning pipeline that captures fine-grain geometric information without having to rely on any smoothness assumptions. Our approach involves working within the scope of algebraic geometry and algebraic varieties instead of differential geometry and smooth manifolds. In the setting of the variety hypothesis, the learning problem becomes to find the underlying variety using sample data. We cast this learning problem into a Maximum A Posteriori optimization problem which we solve in terms of an eigenvalue computation. Having found the underlying variety, we explore the use of Gr\"obner bases and numerical methods to reveal information about its geometry. In particular, we propose a heuristic for numerically detecting points lying near the singular locus of the underlying variety.  ( 2 min )
    Privacy-Preserving Distributed Machine Learning Made Faster. (arXiv:2205.05825v1 [cs.CR])
    With the development of machine learning, it is difficult for a single server to process all the data. So machine learning tasks need to be spread across multiple servers, turning the centralized machine learning into a distributed one. However, privacy remains an unsolved problem in distributed machine learning. Multi-key homomorphic encryption is one of the suitable candidates to solve the problem. However, the most recent result of the Multi-key homomorphic encryption scheme (MKTFHE) only supports the NAND gate. Although it is Turing complete, it requires efficient encapsulation of the NAND gate to further support mathematical calculation. This paper designs and implements a series of operations on positive and negative integers accurately. First, we design basic bootstrapped gates with the same efficiency as that of the NAND gate. Second, we construct practical $k$-bit complement mathematical operators based on our basic binary bootstrapped gates. The constructed created can perform addition, subtraction, multiplication, and division on both positive and negative integers. Finally, we demonstrated the generality of the designed operators by achieving a distributed privacy-preserving machine learning algorithm, i.e. linear regression with two different solutions. Experiments show that the operators we designed are practical and efficient.  ( 2 min )
    Long Story Short: Omitted Variable Bias in Causal Machine Learning. (arXiv:2112.13398v3 [econ.EM] UPDATED)
    We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution -- all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects and avearage derivatives) the bound is shown to depend on easily interpretable quantities that measure the explanatory power of the omitted variables. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Furthermore, we use debiased machine learning to provide flexible and efficient statistical inference on learnable components of the bounds. Finally, empirical examples demonstrate the usefulness of the approach.  ( 2 min )
    Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound. (arXiv:2205.05838v1 [cs.LG])
    Comparing structured data from possibly different metric-measure spaces is a fundamental task in machine learning, with applications in, e.g., graph classification. The Gromov-Wasserstein (GW) discrepancy formulates a coupling between the structured data based on optimal transportation, tackling the incomparability between different structures by aligning the intra-relational geometries. Although efficient local solvers such as conditional gradient and Sinkhorn are available, the inherent non-convexity still prevents a tractable evaluation, and the existing lower bounds are not tight enough for practical use. To address this issue, we take inspiration from the connection with the quadratic assignment problem, and propose the orthogonal Gromov-Wasserstein (OGW) discrepancy as a surrogate of GW. It admits an efficient and closed-form lower bound with the complexity of $\mathcal{O}(n^3)$, and directly extends to the fused Gromov-Wasserstein (FGW) distance, incorporating node features into the coupling. Extensive experiments on both the synthetic and real-world datasets show the tightness of our lower bounds, and both OGW and its lower bounds efficiently deliver accurate predictions and satisfactory barycenters for graph sets.  ( 2 min )
    GAN-DUF: Hierarchical Deep Generative Models for Design Under Free-Form Geometric Uncertainty. (arXiv:2202.10558v3 [cs.CE] UPDATED)
    Deep generative models have demonstrated effectiveness in learning compact and expressive design representations that significantly improve geometric design optimization. However, these models do not consider the uncertainty introduced by manufacturing or fabrication. Past work that quantifies such uncertainty often makes simplifying assumptions on geometric variations, while the "real-world", "free-form" uncertainty and its impact on design performance are difficult to quantify due to the high dimensionality. To address this issue, we propose a Generative Adversarial Network-based Design under Uncertainty Framework (GAN-DUF), which contains a deep generative model that simultaneously learns a compact representation of nominal (ideal) designs and the conditional distribution of fabricated designs given any nominal design. This opens up new possibilities of 1)~building a universal uncertainty quantification model compatible with both shape and topological designs, 2)~modeling free-form geometric uncertainties without the need to make any assumptions on the distribution of geometric variability, and 3)~allowing fast prediction of uncertainties for new nominal designs. We can combine the proposed deep generative model with robust design optimization or reliability-based design optimization for design under uncertainty. We demonstrated the framework on two real-world engineering design examples and showed its capability of finding the solution that possesses better performances after fabrication.  ( 2 min )
    Leveraging Uncertainty for Deep Interpretable Classification and Weakly-Supervised Segmentation of Histology Images. (arXiv:2205.05841v1 [eess.IV])
    Trained using only image class label, deep weakly supervised methods allow image classification and ROI segmentation for interpretability. Despite their success on natural images, they face several challenges over histology data where ROI are visually similar to background making models vulnerable to high pixel-wise false positives. These methods lack mechanisms for modeling explicitly non-discriminative regions which raises false-positive rates. We propose novel regularization terms, which enable the model to seek both non-discriminative and discriminative regions, while discouraging unbalanced segmentations and using only image class label. Our method is composed of two networks: a localizer that yields segmentation mask, followed by a classifier. The training loss pushes the localizer to build a segmentation mask that holds most discrimiantive regions while simultaneously modeling background regions. Comprehensive experiments over two histology datasets showed the merits of our method in reducing false positives and accurately segmenting ROI.  ( 2 min )
    Comments on: "Hybrid Semiparametric Bayesian Networks". (arXiv:2205.05910v1 [stat.ME])
    Invited discussion on the paper "Hybrid Semiparametric Bayesian Networks" by David Atienza, Pedro Larranaga and Concha Bielza (TEST, 2022).  ( 2 min )
    Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning. (arXiv:2205.06000v1 [cs.LG])
    In this work, we investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning. However, metric learning requires supervision in the form of a distance function, which is absent in reinforcement learning. To overcome this, we leverage the sequential nature of states in a replay buffer to approximate a distance metric and provide a weak supervision signal, under the assumption that temporally close states are also semantically similar. We modify a VAE with triplet loss and demonstrate that this approach is able to learn useful features for downstream tasks, without additional supervision, in environments where standard VAEs fail.  ( 2 min )
    Secure Aggregation for Federated Learning in Flower. (arXiv:2205.06117v1 [cs.LG])
    Federated Learning (FL) allows parties to learn a shared prediction model by delegating the training computation to clients and aggregating all the separately trained models on the server. To prevent private information being inferred from local models, Secure Aggregation (SA) protocols are used to ensure that the server is unable to inspect individual trained models as it aggregates them. However, current implementations of SA in FL frameworks have limitations, including vulnerability to client dropouts or configuration difficulties. In this paper, we present Salvia, an implementation of SA for Python users in the Flower FL framework. Based on the SecAgg(+) protocols for a semi-honest threat model, Salvia is robust against client dropouts and exposes a flexible and easy-to-use API that is compatible with various machine learning frameworks. We show that Salvia's experimental performance is consistent with SecAgg(+)'s theoretical computation and communication complexities.  ( 2 min )
    A Generalist Agent. (arXiv:2205.06175v1 [cs.AI])
    Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.  ( 2 min )
    Reducing Overconfidence Predictions for Autonomous Driving Perception. (arXiv:2202.07825v2 [cs.CV] UPDATED)
    In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose a probabilistic approach based on distributions calculated out of the Logit layer scores of pre-trained networks. We demonstrate that Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) functions are more suitable for probabilistic interpretations than SoftMax and Sigmoid-based predictions for object recognition. We explore distinct sensor modalities via RGB images and LiDARs (RV: range-view) data from the KITTI and Lyft Level-5 datasets, where our approach shows promising performance compared to the usual SoftMax and Sigmoid layers, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this paper is that the ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP functions are used in the test/prediction phase.  ( 2 min )
    Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering. (arXiv:2204.04581v2 [cs.CL] UPDATED)
    Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at the cost of interpretability, controllability, and efficiency. The opposite properties arise in other methods which have instead relied on knowledge base (KB) facts. At the same time, more recent work has demonstrated the effectiveness of storing and retrieving from an index of Q-A pairs derived from text \citep{lewis2021paq}. This approach yields a high coverage knowledge representation that maintains KB-like properties due to its representations being more atomic units of information. In this work we push this line of research further by proposing a question-answer augmented encoder-decoder model and accompanying pretraining strategy. This yields an end-to-end system that not only outperforms prior QA retrieval methods on single-hop QA tasks but also enables compositional reasoning, as demonstrated by strong performance on two multi-hop QA datasets. Together, these methods improve the ability to interpret and control the model while narrowing the performance gap with passage retrieval systems.  ( 2 min )
    Adversarial Estimators. (arXiv:2204.10495v2 [econ.EM] UPDATED)
    We develop an asymptotic theory of adversarial estimators ('A-estimators'). They generalize maximum-likelihood-type estimators ('M-estimators') as their objective is maximized by some parameters and minimized by others. This class subsumes the continuous-updating Generalized Method of Moments, Generative Adversarial Networks and more recent proposals in machine learning and econometrics. In these examples, researchers state which aspects of the problem may in principle be used for estimation, and an adversary learns how to emphasize them optimally. We derive the convergence rates of A-estimators under pointwise and partial identification, and the normality of functionals of their parameters. Unknown functions may be approximated via sieves such as deep neural networks, for which we provide simplified low-level conditions. As a corollary, we obtain the normality of neural-net M-estimators, overcoming technical issues previously identified by the literature. Our theory yields novel results about a variety of A-estimators, providing intuition and formal justification for their success in recent applications.  ( 2 min )
    Dimension-adaptive machine-learning-based quantum state reconstruction. (arXiv:2205.05804v1 [quant-ph])
    We introduce an approach for performing quantum state reconstruction on systems of $n$ qubits using a machine-learning-based reconstruction system trained exclusively on $m$ qubits, where $m\geq n$. This approach removes the necessity of exactly matching the dimensionality of a system under consideration with the dimension of a model used for training. We demonstrate our technique by performing quantum state reconstruction on randomly sampled systems of one, two, and three qubits using machine-learning-based methods trained exclusively on systems containing at least one additional qubit. The reconstruction time required for machine-learning-based methods scales significantly more favorably than the training time; hence this technique can offer an overall savings of resources by leveraging a single neural network for dimension-variable state reconstruction, obviating the need to train dedicated machine-learning systems for each Hilbert space.  ( 2 min )
    eFedDNN: Ensemble based Federated Deep Neural Networks for Trajectory Mode Inference. (arXiv:2205.05756v1 [cs.LG])
    As the most significant data source in smart mobility systems, GPS trajectories can help identify user travel mode. However, these GPS datasets may contain users' private information (e.g., home location), preventing many users from sharing their private information with a third party. Hence, identifying travel modes while protecting users' privacy is a significant issue. To address this challenge, we use federated learning (FL), a privacy-preserving machine learning technique that aims at collaboratively training a robust global model by accessing users' locally trained models but not their raw data. Specifically, we designed a novel ensemble-based Federated Deep Neural Network (eFedDNN). The ensemble method combines the outputs of the different models learned via FL by the users and shows an accuracy that surpasses comparable models reported in the literature. Extensive experimental studies on a real-world open-access dataset from Montreal demonstrate that the proposed inference model can achieve accurate identification of users' mode of travel without compromising privacy.  ( 2 min )
    Cross-domain Few-shot Meta-learning Using Stacking. (arXiv:2205.05831v1 [cs.CV])
    Cross-domain few-shot meta-learning (CDFSML) addresses learning problems where knowledge needs to be transferred from several source domains into an instance-scarce target domain with an explicitly different input distribution. Recently published CDFSML methods generally construct a "universal model" that combines knowledge of multiple source domains into one backbone feature extractor. This enables efficient inference but necessitates re-computation of the backbone whenever a new source domain is added. Moreover, state-of-the-art methods derive their universal model from a collection of backbones -- normally one for each source domain -- and the backbones may be constrained to have the same architecture as the universal model. We propose a CDFSML method that is inspired by the classic stacking approach to meta learning. It imposes no constraints on the backbones' architecture or feature shape and does not incur the computational overhead of (re-)computing a universal model. Given a target-domain task, it fine-tunes each backbone independently, uses cross-validation to extract meta training data from the task's instance-scarce support set, and learns a simple linear meta classifier from this data. We evaluate our stacking approach on the well-known Meta-Dataset benchmark, targeting image classification with convolutional neural networks, and show that it often yields substantially higher accuracy than competing methods.  ( 2 min )
    Open Vocabulary Extreme Classification Using Generative Models. (arXiv:2205.05812v1 [cs.CL])
    The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set. The label vocabulary is typically defined in advance by domain experts and assumed to capture all necessary tags. However in real world scenarios this label set, although large, is often incomplete and experts frequently need to refine it. To develop systems that simplify this process, we introduce the task of open vocabulary XMC (OXMC): given a piece of content, predict a set of labels, some of which may be outside of the known tag set. Hence, in addition to not having training data for some labels - as is the case in zero-shot classification - models need to invent some labels on-the-fly. We propose GROOV, a fine-tuned seq2seq model for OXMC that generates the set of labels as a flat sequence and is trained using a novel loss independent of predicted label order. We show the efficacy of the approach, experimenting with popular XMC datasets for which GROOV is able to predict meaningful labels outside the given vocabulary while performing on par with state-of-the-art solutions for known labels.  ( 2 min )
    Deep-Learned Generators of Porosity Distributions Produced During Metal Additive Manufacturing. (arXiv:2205.05794v1 [cs.LG])
    Laser Powder Bed Fusion has become a widely adopted method for metal Additive Manufacturing (AM) due to its ability to mass produce complex parts with increased local control. However, AM produced parts can be subject to undesirable porosity, negatively influencing the properties of printed components. Thus, controlling porosity is integral for creating effective parts. A precise understanding of the porosity distribution is crucial for accurately simulating potential fatigue and failure zones. Previous research on generating synthetic porous microstructures have succeeded in generating parts with high density, isotropic porosity distributions but are often inapplicable to cases with sparser, boundary-dependent pore distributions. Our work bridges this gap by providing a method that considers these constraints by deconstructing the generation problem into its constitutive parts. A framework is introduced that combines Generative Adversarial Networks with Mallat Scattering Transform-based autocorrelation methods to construct novel realizations of the individual pore geometries and surface roughness, then stochastically reconstruct them to form realizations of a porous printed part. The generated parts are compared to the existing experimental porosity distributions based on statistical and dimensional metrics, such as nearest neighbor distances, pore volumes, pore anisotropies and scattering transform based auto-correlations.  ( 2 min )
    Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models. (arXiv:2205.05787v1 [cs.RO])
    Bridging model-based safety and model-free reinforcement learning (RL) for dynamic robots is appealing since model-based methods are able to provide formal safety guarantees, while RL-based methods are able to exploit the robot agility by learning from the full-order system dynamics. However, current approaches to tackle this problem are mostly restricted to simple systems. In this paper, we propose a new method to combine model-based safety with model-free reinforcement learning by explicitly finding a low-dimensional model of the system controlled by a RL policy and applying stability and safety guarantees on that simple model. We use a complex bipedal robot Cassie, which is a high dimensional nonlinear system with hybrid dynamics and underactuation, and its RL-based walking controller as an example. We show that a low-dimensional dynamical model is sufficient to capture the dynamics of the closed-loop system. We demonstrate that this model is linear, asymptotically stable, and is decoupled across control input in all dimensions. We further exemplify that such linearity exists even when using different RL control policies. Such results point out an interesting direction to understand the relationship between RL and optimal control: whether RL tends to linearize the nonlinear system during training in some cases. Furthermore, we illustrate that the found linear model is able to provide guarantees by safety-critical optimal control framework, e.g., Model Predictive Control with Control Barrier Functions, on an example of autonomous navigation using Cassie while taking advantage of the agility provided by the RL-based controller.  ( 2 min )
    Distinction Maximization Loss: Efficiently Improving Classification Accuracy, Uncertainty Estimation, and Out-of-Distribution Detection Simply Replacing the Loss and Calibrating. (arXiv:2205.05874v1 [cs.LG])
    Building robust deterministic deep neural networks is still a challenge. On the one hand, some approaches improve out-of-distribution detection at the cost of reducing classification accuracy in some situations. On the other hand, some methods simultaneously increase classification accuracy, out-of-distribution detection, and uncertainty estimation, but reduce inference efficiency, in addition to training the same model many times to tune hyperparameters. In this paper, we propose training deterministic deep neural networks using our DisMax loss, which works as a drop-in replacement for the commonly used SoftMax loss (i.e., the combination of the linear output layer, the SoftMax activation, and the cross-entropy loss). Starting from the IsoMax+ loss, we created novel logits that are based on the distance to all prototypes rather than just the one associated with the correct class. We also propose a novel way to augment images to construct what we call fractional probability regularization. Moreover, we propose a new score to perform out-of-distribution detection and a fast way to calibrate the network after training. Our experiments show that DisMax usually outperforms all current approaches simultaneously in classification accuracy, uncertainty estimation, inference efficiency, and out-of-distribution detection, avoiding hyperparameter tuning and repetitive model training. The code to replace the SoftMax loss with the DisMax loss and reproduce the results in this paper is available at https://github.com/dlmacedo/distinction-maximization-loss.  ( 2 min )
    Deep Learning and Synthetic Media. (arXiv:2205.05764v1 [cs.LG])
    Deep learning algorithms are rapidly changing the way in which audiovisual media can be produced. Synthetic audiovisual media generated with deep learning - often subsumed colloquially under the label "deepfakes" - have a number of impressive characteristics; they are increasingly trivial to produce, and can be indistinguishable from real sounds and images recorded with a sensor. Much attention has been dedicated to ethical concerns raised by this technological development. Here, I focus instead on a set of issues related to the notion of synthetic audiovisual media, its place within a broader taxonomy of audiovisual media, and how deep learning techniques differ from more traditional approaches to media synthesis. After reviewing important etiological features of deep learning pipelines for media manipulation and generation, I argue that "deepfakes" and related synthetic media produced with such pipelines do not merely offer incremental improvements over previous methods, but challenge traditional taxonomical distinctions, and pave the way for genuinely novel kinds of audiovisual media.  ( 2 min )
    Representation Learning for Context-Dependent Decision-Making. (arXiv:2205.05820v1 [cs.LG])
    Humans are capable of adjusting to changing environments flexibly and quickly. Empirical evidence has revealed that representation learning plays a crucial role in endowing humans with such a capability. Inspired by this observation, we study representation learning in the sequential decision-making scenario with contextual changes. We propose an online algorithm that is able to learn and transfer context-dependent representations and show that it significantly outperforms the existing ones that do not learn representations adaptively. As a case study, we apply our algorithm to the Wisconsin Card Sorting Task, a well-established test for the mental flexibility of humans in sequential decision-making. By comparing our algorithm with the standard Q-learning and Deep-Q learning algorithms, we demonstrate the benefits of adaptive representation learning.  ( 2 min )
    A Deep Learning Approach for Predicting Two-dimensional Soil Consolidation Using Physics-Informed Neural Networks (PINN). (arXiv:2205.05710v1 [cs.CE])
    Soil consolidation is closely related to seepage, stability, and settlement of geotechnical buildings and foundations, and directly affects the use and safety of superstructures. Nowadays, the unidirectional consolidation theory of soils is widely used in certain conditions and approximate calculations. The multi-directional theory of soil consolidation is more reasonable than the unidirectional theory in practical applications, but it is much more complicated in terms of index determination and solution. To address the above problem, in this paper, we propose a deep learning method using physics-informed neural networks (PINN) to predict the excess pore water pressure of two-dimensional soil consolidation. In the proposed method, (1) a fully connected neural network is constructed, (2) the computational domain, partial differential equation (PDE), and constraints are defined to generate data for model training, and (3) the PDE of two-dimensional soil consolidation and the model of the neural network is connected to reduce the loss of the model. The effectiveness of the proposed method is verified by comparison with the numerical solution of PDE for two-dimensional consolidation. Using this method, the excess pore water pressure could be predicted simply and efficiently. In addition, the method was applied to predict the soil excess pore water pressure in the foundation in a real case at Tianjin port, China. The proposed deep learning approach can be used to investigate the large and complex multi-directional soil consolidation.  ( 2 min )
    A Survey of Risk-Aware Multi-Armed Bandits. (arXiv:2205.05843v1 [stat.ML])
    In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise the existing research on risk measures, specifically in the context of multi-armed bandits. We review various risk measures of interest, and comment on their properties. Next, we review existing concentration inequalities for various risk measures. Then, we proceed to defining risk-aware bandit problems, We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests, as well as the best-arm identification setting, which is a pure exploration problem -- both in the context of risk-sensitive measures. We conclude by commenting on persisting challenges and fertile areas for future research.  ( 2 min )
    RITA: a Study on Scaling Up Generative Protein Sequence Models. (arXiv:2205.05789v1 [q-bio.QM])
    In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.  ( 2 min )
    Visualization Guidelines for Model Performance Communication Between Data Scientists and Subject Matter Experts. (arXiv:2205.05749v1 [cs.HC])
    Presenting the complexities of a model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in deciding to use a model. As a result, models may fail in unexpected ways if their weaknesses are not clearly understood. Alternatively, models may go unused, as subject matter experts disregard poorly presented models in favor of familiar, yet arguably substandard methods. In this paper, we propose effective use of visualization as a medium for communication between data scientists and subject matter experts. Our research addresses the gap between common practices in model performance communication and the understanding of subject matter experts and decision makers. We derive a set of communication guidelines and recommended visualizations for communicating model performance based on interviews of both data scientists and subject matter experts at the same organization. We conduct a follow-up study with subject matter experts to evaluate the efficacy of our guidelines in presentations of model performance with and without our recommendations. We find that our proposed guidelines made subject matter experts more aware of the tradeoffs of the presented model. Participants realized that current communication methods left them without a robust understanding of the model's performance, potentially giving them misplaced confidence in the use of the model.  ( 2 min )
    Learning to Guide Multiple Heterogeneous Actors from a Single Human Demonstration via Automatic Curriculum Learning in StarCraft II. (arXiv:2205.05784v1 [cs.LG])
    Traditionally, learning from human demonstrations via direct behavior cloning can lead to high-performance policies given that the algorithm has access to large amounts of high-quality data covering the most likely scenarios to be encountered when the agent is operating. However, in real-world scenarios, expert data is limited and it is desired to train an agent that learns a behavior policy general enough to handle situations that were not demonstrated by the human expert. Another alternative is to learn these policies with no supervision via deep reinforcement learning, however, these algorithms require a large amount of computing time to perform well on complex tasks with high-dimensional state and action spaces, such as those found in StarCraft II. Automatic curriculum learning is a recent mechanism comprised of techniques designed to speed up deep reinforcement learning by adjusting the difficulty of the current task to be solved according to the agent's current capabilities. Designing a proper curriculum, however, can be challenging for sufficiently complex tasks, and thus we leverage human demonstrations as a way to guide agent exploration during training. In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors where starting positions and overall difficulty of the task are controlled by an automatically-generated curriculum from a single human demonstration. Our results show that an agent trained via automated curriculum learning can outperform state-of-the-art deep reinforcement learning baselines and match the performance of the human expert in a simulated command and control task in StarCraft II modeled over a real military scenario.  ( 2 min )
    Stochastic first-order methods for average-reward Markov decision processes. (arXiv:2205.05800v1 [cs.LG])
    We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy these issues, we develop a novel variance-reduced temporal difference (VRTD) method with linear function approximation for randomized policies along with optimal convergence guarantees, and an exploratory variance-reduced temporal difference (EVRTD) method for insufficiently random policies with comparable convergence guarantees. We further establish linear convergence rate on the bias of policy evaluation, which is essential for improving the overall sample complexity of policy optimization. On the other hand, compared with intensive research interest in finite sample analysis of policy gradient methods for discounted MDPs, existing studies on policy gradient methods for AMDPs mostly focus on regret bounds under restrictive assumptions on the underlying Markov processes (see, e.g., Abbasi-Yadkori et al., 2019), and they often lack guarantees on the overall sample complexities. Towards this end, we develop an average-reward variant of the stochastic policy mirror descent (SPMD) (Lan, 2022). We establish the first $\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for solving AMDPs with policy gradient method under both the generative model (with unichain assumption) and Markovian noise model (with ergodic assumption). This bound can be further improved to $\widetilde{\mathcal{O}}(\epsilon^{-1})$ for solving regularized AMDPs. Our theoretical advantages are corroborated by numerical experiments.  ( 2 min )
    Tiny Robot Learning: Challenges and Directions for Machine Learning in Resource-Constrained Robots. (arXiv:2205.05748v1 [cs.LG])
    Machine learning (ML) has become a pervasive tool across computing systems. An emerging application that stress-tests the challenges of ML system design is tiny robot learning, the deployment of ML on resource-constrained low-cost autonomous robots. Tiny robot learning lies at the intersection of embedded systems, robotics, and ML, compounding the challenges of these domains. Tiny robot learning is subject to challenges from size, weight, area, and power (SWAP) constraints; sensor, actuator, and compute hardware limitations; end-to-end system tradeoffs; and a large diversity of possible deployment scenarios. Tiny robot learning requires ML models to be designed with these challenges in mind, providing a crucible that reveals the necessity of holistic ML system design and automated end-to-end design tools for agile development. This paper gives a brief survey of the tiny robot learning space, elaborates on key challenges, and proposes promising opportunities for future work in ML system design.  ( 2 min )
    Individual Fairness Guarantees for Neural Networks. (arXiv:2205.05763v1 [cs.LG])
    We consider the problem of certifying the individual fairness (IF) of feed-forward neural networks (NNs). In particular, we work with the $\epsilon$-$\delta$-IF formulation, which, given a NN and a similarity metric learnt from data, requires that the output difference between any pair of $\epsilon$-similar individuals is bounded by a maximum decision tolerance $\delta \geq 0$. Working with a range of metrics, including the Mahalanobis distance, we propose a method to overapproximate the resulting optimisation problem using piecewise-linear functions to lower and upper bound the NN's non-linearities globally over the input space. We encode this computation as the solution of a Mixed-Integer Linear Programming problem and demonstrate that it can be used to compute IF guarantees on four datasets widely used for fairness benchmarking. We show how this formulation can be used to encourage models' fairness at training time by modifying the NN loss, and empirically confirm our approach yields NNs that are orders of magnitude fairer than state-of-the-art methods.  ( 2 min )
    LSI: A Learned Secondary Index Structure. (arXiv:2205.05769v1 [cs.DB])
    Learned index structures have been shown to achieve favorable lookup performance and space consumption compared to their traditional counterparts such as B-trees. However, most learned index studies have focused on the primary indexing setting, where the base data is sorted. In this work, we investigate whether learned indexes sustain their advantage in the secondary indexing setting. We introduce Learned Secondary Index (LSI), a first attempt to use learned indexes for indexing unsorted data. LSI works by building a learned index over a permutation vector, which allows binary search to performed on the unsorted base data using random access. We additionally augment LSI with a fingerprint vector to accelerate equality lookups. We show that LSI achieves comparable lookup performance to state-of-the-art secondary indexes while being up to 6x more space efficient.  ( 2 min )
    Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. (arXiv:2205.05718v1 [cs.CL])
    Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our learning. Here, we ask: how much of human-like thinking can be captured by learning statistical patterns in language alone? We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs). Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization to new, out-of-distribution problems expressed in language. We find that humans are far more robust than LLMs on this benchmark. Next, we propose a hybrid Parse-and-Solve model, which augments distributional LLMs with a structured symbolic reasoning module. We find that this model shows more robust adaptation to out-of-distribution planning problems, demonstrating the promise of hybrid AI models for more human-like reasoning.  ( 2 min )
    A time-varying study of Chinese investor sentiment, stock market liquidity and volatility: Based on deep learning BERT model and TVP-VAR model. (arXiv:2205.05719v1 [q-fin.CP])
    Based on the commentary data of the Shenzhen Stock Index bar on the EastMoney website from January 1, 2018 to December 31, 2019. This paper extracts the embedded investor sentiment by using a deep learning BERT model and investigates the time-varying linkage between investment sentiment, stock market liquidity and volatility using a TVP-VAR model. The results show that the impact of investor sentiment on stock market liquidity and volatility is stronger. Although the inverse effect is relatively small, it is more pronounced with the state of the stock market. In all cases, the response is more pronounced in the short term than in the medium to long term, and the impact is asymmetric, with shocks stronger when the market is in a downward spiral.  ( 2 min )
  • Open

    Causal discovery under a confounder blanket. (arXiv:2205.05715v1 [stat.ME])
    Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering a directed acyclic subgraph of variables known to be causally descended from some (possibly large) set of confounding covariates, i.e. a $\textit{confounder blanket}$. This is useful in many settings, for example when studying a dynamic biomolecular subsystem with genetic data providing causally relevant background information. Under a structural assumption that, we argue, must be satisfied in practice if informative answers are to be found, our method accommodates graphs of low or high sparsity while maintaining polynomial time complexity. We derive a sound and complete algorithm for identifying causal relationships under these conditions and implement testing procedures with provable error control for linear and nonlinear systems. We demonstrate our approach on a range of simulation settings.  ( 2 min )
    Stochastic first-order methods for average-reward Markov decision processes. (arXiv:2205.05800v1 [cs.LG])
    We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy these issues, we develop a novel variance-reduced temporal difference (VRTD) method with linear function approximation for randomized policies along with optimal convergence guarantees, and an exploratory variance-reduced temporal difference (EVRTD) method for insufficiently random policies with comparable convergence guarantees. We further establish linear convergence rate on the bias of policy evaluation, which is essential for improving the overall sample complexity of policy optimization. On the other hand, compared with intensive research interest in finite sample analysis of policy gradient methods for discounted MDPs, existing studies on policy gradient methods for AMDPs mostly focus on regret bounds under restrictive assumptions on the underlying Markov processes (see, e.g., Abbasi-Yadkori et al., 2019), and they often lack guarantees on the overall sample complexities. Towards this end, we develop an average-reward variant of the stochastic policy mirror descent (SPMD) (Lan, 2022). We establish the first $\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for solving AMDPs with policy gradient method under both the generative model (with unichain assumption) and Markovian noise model (with ergodic assumption). This bound can be further improved to $\widetilde{\mathcal{O}}(\epsilon^{-1})$ for solving regularized AMDPs. Our theoretical advantages are corroborated by numerical experiments.  ( 2 min )
    Probabilistic methods for approximate archetypal analysis. (arXiv:2108.05767v3 [stat.CO] UPDATED)
    Archetypal analysis is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of archetypal analysis in practice is the inherent computational complexity of the existing algorithms. In this paper, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that provided the data is approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of archetypal analysis. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of archetypal analysis. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.  ( 2 min )
    Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound. (arXiv:2205.05838v1 [cs.LG])
    Comparing structured data from possibly different metric-measure spaces is a fundamental task in machine learning, with applications in, e.g., graph classification. The Gromov-Wasserstein (GW) discrepancy formulates a coupling between the structured data based on optimal transportation, tackling the incomparability between different structures by aligning the intra-relational geometries. Although efficient local solvers such as conditional gradient and Sinkhorn are available, the inherent non-convexity still prevents a tractable evaluation, and the existing lower bounds are not tight enough for practical use. To address this issue, we take inspiration from the connection with the quadratic assignment problem, and propose the orthogonal Gromov-Wasserstein (OGW) discrepancy as a surrogate of GW. It admits an efficient and closed-form lower bound with the complexity of $\mathcal{O}(n^3)$, and directly extends to the fused Gromov-Wasserstein (FGW) distance, incorporating node features into the coupling. Extensive experiments on both the synthetic and real-world datasets show the tightness of our lower bounds, and both OGW and its lower bounds efficiently deliver accurate predictions and satisfactory barycenters for graph sets.  ( 2 min )
    Adversarial Estimators. (arXiv:2204.10495v2 [econ.EM] UPDATED)
    We develop an asymptotic theory of adversarial estimators ('A-estimators'). They generalize maximum-likelihood-type estimators ('M-estimators') as their objective is maximized by some parameters and minimized by others. This class subsumes the continuous-updating Generalized Method of Moments, Generative Adversarial Networks and more recent proposals in machine learning and econometrics. In these examples, researchers state which aspects of the problem may in principle be used for estimation, and an adversary learns how to emphasize them optimally. We derive the convergence rates of A-estimators under pointwise and partial identification, and the normality of functionals of their parameters. Unknown functions may be approximated via sieves such as deep neural networks, for which we provide simplified low-level conditions. As a corollary, we obtain the normality of neural-net M-estimators, overcoming technical issues previously identified by the literature. Our theory yields novel results about a variety of A-estimators, providing intuition and formal justification for their success in recent applications.  ( 2 min )
    Comments on: "Hybrid Semiparametric Bayesian Networks". (arXiv:2205.05910v1 [stat.ME])
    Invited discussion on the paper "Hybrid Semiparametric Bayesian Networks" by David Atienza, Pedro Larranaga and Concha Bielza (TEST, 2022).  ( 2 min )
    Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements. (arXiv:2205.06129v1 [stat.ML])
    Prediction of an individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two Census data problems that contribute to unsatisfactory predictive performance for minorities. First, the decennial Census often contains zero counts for minority racial groups in the Census blocks where some members of those groups reside. Second, because the Census surname files only include frequent names, many surnames -- especially those of minorities -- are missing from the list. To address the zero counts problem, we introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts by extending the na\"ive Bayesian inference of the BISG methodology to full posterior inference. To address the missing surname problem, we supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians. The proposed methodology, together with additional name data, is available via the open-source software package wru.  ( 2 min )
    Optimal transport weights for causal inference. (arXiv:2109.01991v4 [stat.ME] UPDATED)
    Imbalance in covariate distributions leads to biased estimates of causal effects. Weighting methods attempt to correct this imbalance but rely on specifying models for the treatment assignment mechanism, which is unknown in observational studies. This leaves researchers to choose the proper weighting method and the appropriate covariate functions for these models without knowing the correct combination to achieve distributional balance. In response to these difficulties, we propose a nonparametric generalization of several other weighting schemes found in the literature: Causal Optimal Transport. This new method directly targets distributional balance by minimizing optimal transport distances between treatment and control groups or, more generally, between any source and target population. Our approach is semiparametrically efficient and model-free but can also incorporate moments or any other important functions of covariates that a researcher desires to balance. Moreover, our method can provide nonparametric estimate the conditional mean outcome function and we give rates for the convergence of this estimator. Moreover, we show how this method can provide nonparametric imputations of the missing potential outcomes and give rates of convergence for this estimator. We find that Causal Optimal Transport outperforms competitor methods when both the propensity score and outcome models are misspecified, indicating it is a robust alternative to common weighting methods. Finally, we demonstrate the utility of our method in an external control trial examining the effect of misoprostol versus oxytocin for the treatment of post-partum hemorrhage.  ( 2 min )
    How I failed machine learning in medical imaging -- shortcomings and recommendations. (arXiv:2103.10292v2 [eess.IV] UPDATED)
    Medical imaging is an important research field with many opportunities for improving patients' health. However, there are a number of challenges that are slowing down the progress of the field as a whole, such optimizing for publication. In this paper we reviewed several problems related to choosing datasets, methods, evaluation metrics, and publication strategies. With a review of literature and our own analysis, we show that at every step, potential biases can creep in. On a positive note, we also see that initiatives to counteract these problems are already being started. Finally we provide a broad range of recommendations on how to further these address problems in the future. For reproducibility, data and code for our analyses are available on \url{https://github.com/GaelVaroquaux/ml_med_imaging_failures}  ( 2 min )
    Generating Fair Universal Representations using Adversarial Models. (arXiv:1910.00411v7 [cs.LG] UPDATED)
    We present a data-driven framework for learning fair universal representations (FUR) that guarantee statistical fairness for any learning task that may not be known a priori. Our framework leverages recent advances in adversarial learning to allow a data holder to learn representations in which a set of sensitive attributes are decoupled from the rest of the dataset. We formulate this as a constrained minimax game between an encoder and an adversary where the constraint ensures a measure of usefulness (utility) of the representation. The resulting problem is that of censoring, i.e., finding a representation that is least informative about the sensitive attributes given a utility constraint. For appropriately chosen adversarial loss functions, our censoring framework precisely clarifies the optimal adversarial strategy against strong information-theoretic adversaries; it also achieves the fairness measure of demographic parity for the resulting constrained representations. We evaluate the performance of our proposed framework on both synthetic and publicly available datasets. For these datasets, we use two tradeoff measures: censoring vs. representation fidelity and fairness vs. utility for downstream tasks, to amply demonstrate that multiple sensitive features can be effectively censored even as the resulting fair representations ensure accuracy for multiple downstream tasks.  ( 2 min )
    An $l_1$-oracle inequality for the Lasso in mixture-of-experts regression models. (arXiv:2009.10622v3 [math.ST] UPDATED)
    Mixture-of-experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of available statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes when compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size, is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results for dealing with the curse of dimensionality, for both the statistical estimation and feature selection problems. We consider the finite MoE model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its $l_1$-regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an $l_1$-oracle inequality satisfied by the Lasso estimator according to the Kullback--Leibler loss.  ( 2 min )
    A Survey of Risk-Aware Multi-Armed Bandits. (arXiv:2205.05843v1 [stat.ML])
    In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise the existing research on risk measures, specifically in the context of multi-armed bandits. We review various risk measures of interest, and comment on their properties. Next, we review existing concentration inequalities for various risk measures. Then, we proceed to defining risk-aware bandit problems, We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests, as well as the best-arm identification setting, which is a pure exploration problem -- both in the context of risk-sensitive measures. We conclude by commenting on persisting challenges and fertile areas for future research.  ( 2 min )
    Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling. (arXiv:2205.06024v1 [stat.ML])
    The Plackett-Luce (PL) model is ubiquitous in learning-to-rank (LTR) because it provides a useful and intuitive probabilistic model for sampling ranked lists. Counterfactual offline evaluation and optimization of ranking metrics are pivotal for using LTR methods in production. When adopting the PL model as a ranking policy, both tasks require the computation of expectations with respect to the model. These are usually approximated via Monte-Carlo (MC) sampling, since the combinatorial scaling in the number of items to be ranked makes their analytical computation intractable. Despite recent advances in improving the computational efficiency of the sampling process via the Gumbel top-k trick, the MC estimates can suffer from high variance. We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model by combining the Gumbel top-k trick with quasi-Monte Carlo (QMC) sampling, a well-established technique for variance reduction. We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.  ( 2 min )
    Training Uncertainty-Aware Classifiers with Conformalized Deep Learning. (arXiv:2205.05878v1 [stat.ML])
    Deep neural networks are powerful tools to detect hidden patterns in data and leverage them to make predictions, but they are not designed to understand uncertainty and estimate reliable probabilities. In particular, they tend to be overconfident. We address this problem by developing a novel training algorithm that can lead to more dependable uncertainty estimates, without sacrificing predictive power. The idea is to mitigate overconfidence by minimizing a loss function, inspired by advances in conformal inference, that quantifies model uncertainty by carefully leveraging hold-out data. Experiments with synthetic and real data demonstrate this method leads to smaller conformal prediction sets with higher conditional coverage, after exact calibration with hold-out data, compared to state-of-the-art alternatives.  ( 2 min )
    An MMSE Lower Bound via Poincar\'e Inequality. (arXiv:2205.05848v1 [cs.IT])
    This paper studies the minimum mean squared error (MMSE) of estimating $\mathbf{X} \in \mathbb{R}^d$ from the noisy observation $\mathbf{Y} \in \mathbb{R}^k$, under the assumption that the noise (i.e., $\mathbf{Y}|\mathbf{X}$) is a member of the exponential family. The paper provides a new lower bound on the MMSE. Towards this end, an alternative representation of the MMSE is first presented, which is argued to be useful in deriving closed-form expressions for the MMSE. This new representation is then used together with the Poincar\'e inequality to provide a new lower bound on the MMSE. Unlike, for example, the Cram\'{e}r-Rao bound, the new bound holds for all possible distributions on the input $\mathbf{X}$. Moreover, the lower bound is shown to be tight in the high-noise regime for the Gaussian noise setting under the assumption that $\mathbf{X}$ is sub-Gaussian. Finally, several numerical examples are shown which demonstrate that the bound performs well in all noise regimes.  ( 2 min )
    Sample Complexity Bounds for Robustly Learning Decision Lists against Evasion Attacks. (arXiv:2205.06127v1 [cs.LG])
    A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks. In this paper we address this issue within the framework of PAC learning, focusing on the class of decision lists. Given that distributional assumptions are essential in the adversarial setting, we work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability. Our key results illustrate that the adversary's budget (that is, the number of bits it can perturb on each input) is a fundamental quantity in determining the sample complexity of robust learning. Our first main result is a sample-complexity lower bound: the class of monotone conjunctions (essentially the simplest non-trivial hypothesis class on the Boolean hypercube) and any superclass has sample complexity at least exponential in the adversary's budget. Our second main result is a corresponding upper bound: for every fixed $k$ the class of $k$-decision lists has polynomial sample complexity against a $\log(n)$-bounded adversary. This sheds further light on the question of whether an efficient PAC learning algorithm can always be used as an efficient $\log(n)$-robust learning algorithm under the uniform distribution.  ( 2 min )
    The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning. (arXiv:2205.06226v1 [cs.LG])
    Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL) method by Grill et al. shows the negative term in contrastive loss can be removed if we add the so-called prediction head to the network. This initiated the research of non-contrastive self-supervised learning. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (stochastic) gradient descent can still learn competitive representations. This phenomenon is a typical example of implicit bias in deep learning and remains little understood. In this work, we present our empirical and theoretical discoveries on non-contrastive self-supervised learning. Empirically, we find that when the prediction head is initialized as an identity matrix with only its off-diagonal entries being trainable, the network can learn competitive representations even though the trivial optima still exist in the training objective. Theoretically, we present a framework to understand the behavior of the trainable, but identity-initialized prediction head. Under a simple setting, we characterized the substitution effect and acceleration effect of the prediction head. The substitution effect happens when learning the stronger features in some neurons can substitute for learning these features in other neurons through updating the prediction head. And the acceleration effect happens when the substituted features can accelerate the learning of other weaker features to prevent them from being ignored. These two effects enable the neural networks to learn all the features rather than focus only on learning the stronger features, which is likely the cause of the dimensional collapse phenomenon. To the best of our knowledge, this is also the first end-to-end optimization guarantee for non-contrastive methods using nonlinear neural networks with a trainable prediction head and normalization.  ( 2 min )
    Learning more skills through optimistic exploration. (arXiv:2107.14226v6 [cs.LG] UPDATED)
    Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latent-conditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agent to explore and master the environment by encouraging each skill (latent) to reliably reach different states. However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective. To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves training an ensemble of discriminators and rewarding the policy for their disagreement. Our objective directly estimates the epistemic uncertainty that comes from the discriminator not having seen enough training examples, thus providing an intrinsic reward more tailored to the true objective compared to pseudocount-based methods (Burda et al., 2019). We call this exploration bonus discriminator disagreement intrinsic reward, or DISDAIN. We demonstrate empirically that DISDAIN improves skill learning both in a tabular grid world (Four Rooms) and the 57 games of the Atari Suite (from pixels). Thus, we encourage researchers to treat pessimism with DISDAIN.  ( 2 min )
    The Implicit Bias of Benign Overfitting. (arXiv:2201.11489v3 [cs.LG] UPDATED)
    The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while attaining low expected loss, has received much attention in recent years, but still remains not fully understood beyond well-specified linear regression setups. In this paper, we provide several new results on when one can or cannot expect benign overfitting to occur, for both regression and classification tasks. We consider a prototypical and rather generic data model for benign overfitting of linear predictors, where an arbitrary input distribution of some fixed dimension $k$ is concatenated with a high-dimensional distribution. For linear regression which is not necessarily well-specified, we show that the minimum-norm interpolating predictor (that standard training methods converge to) is biased towards an inconsistent solution in general, hence benign overfitting will generally not occur. Moreover, we show how this can be extended beyond standard linear regression, by an argument proving how the existence of benign overfitting on some regression problems precludes its existence on other regression problems. We then turn to classification problems, and show that the situation there is much more favorable. Specifically, we prove that the max-margin predictor (to which standard training methods are known to converge in direction) is asymptotically biased towards minimizing a weighted squared hinge loss. This allows us to reduce the question of benign overfitting in classification to the simpler question of whether this loss is a good surrogate for the misclassification error, and use it to show benign overfitting in some new settings.  ( 2 min )
    A non-asymptotic approach for model selection via penalization in high-dimensional mixture of experts models. (arXiv:2104.02640v2 [math.ST] UPDATED)
    Mixture of experts (MoE) are a popular class of statistical and machine learning models that have gained attention over the years due to their flexibility and efficiency. In this work, we consider Gaussian-gated localized MoE (GLoME) and block-diagonal covariance localized MoE (BLoME) regression models to present nonlinear relationships in heterogeneous data with potential hidden graph-structured interactions between high-dimensional predictors. These models pose difficult statistical estimation and model selection questions, both from a computational and theoretical perspective. This paper is devoted to the study of the problem of model selection among a collection of GLoME or BLoME models characterized by the number of mixture components, the complexity of Gaussian mean experts, and the hidden block-diagonal structures of the covariance matrices, in a penalized maximum likelihood estimation framework. In particular, we establish non-asymptotic risk bounds that take the form of weak oracle inequalities, provided that lower bounds for the penalties hold. The good empirical behavior of our models is then demonstrated on synthetic and real datasets.  ( 2 min )
    Long Story Short: Omitted Variable Bias in Causal Machine Learning. (arXiv:2112.13398v3 [econ.EM] UPDATED)
    We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution -- all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects and avearage derivatives) the bound is shown to depend on easily interpretable quantities that measure the explanatory power of the omitted variables. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Furthermore, we use debiased machine learning to provide flexible and efficient statistical inference on learnable components of the bounds. Finally, empirical examples demonstrate the usefulness of the approach.  ( 2 min )
    Robustness and Reliability When Training With Noisy Labels. (arXiv:2110.03321v2 [stat.ML] UPDATED)
    Labelling of data for supervised learning can be costly and time-consuming and the risk of incorporating label noise in large data sets is imminent. When training a flexible discriminative model using a strictly proper loss, such noise will inevitably shift the solution towards the conditional distribution over noisy labels. Nevertheless, while deep neural networks have proven capable of fitting random labels, regularisation and the use of robust loss functions empirically mitigate the effects of label noise. However, such observations concern robustness in accuracy, which is insufficient if reliable uncertainty quantification is critical. We demonstrate this by analysing the properties of the conditional distribution over noisy labels for an input-dependent noise model. In addition, we evaluate the set of robust loss functions characterised by noise-insensitive, asymptotic risk minimisers. We find that strictly proper and robust loss functions both offer asymptotic robustness in accuracy, but neither guarantee that the final model is calibrated. Moreover, even with robust loss functions, overfitting is an issue in practice. With these results, we aim to explain observed robustness of common training practices, such as early stopping, to label noise. In addition, we aim to encourage the development of new noise-robust algorithms that not only preserve accuracy but that also ensure reliability.  ( 2 min )
    Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders. (arXiv:2201.00073v2 [math.ST] UPDATED)
    Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We derive asymptotic expansions of the kernel two-sample statistics, based on which we establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform asymptotic exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected by the kernel two-sample tests and the dimension-and-sample orders. The asymptotic theory is further corroborated through numerical studies.  ( 2 min )
    Fighting Money Laundering with Statistics and Machine Learning: An Introduction and Review. (arXiv:2201.04207v3 [stat.ML] UPDATED)
    Money laundering is a profound global problem. Nonetheless, there is little statistical and machine learning research on the topic. In this paper, we focus on anti-money laundering in banks. To help organize existing research, we propose a unifying terminology and provide a review of the literature. This is structured around two central tasks: (i) client risk profiling and (ii) suspicious behavior flagging. We find that client risk profiling is characterized by diagnostics, i.e., efforts to find and explain risk factors. Suspicious behavior flagging, on the other hand, is characterized by non-disclosed features and hand-crafted risk indices. Finally, we discuss directions for future research. One major challenge is a lack of public data sets. This may, potentially, be addressed by synthetic data generation. Other possible research directions include semi-supervised and deep learning, interpretability, and fairness of the results.  ( 2 min )

  • Open

    [R] Any reference related to regulating the variation of entropies?
    I need some reference papers related to my problem. I have estimations as N normal distributions, but their variance tends to 0. It's because distributions are aggregated to one normal whose variance tends to 0. So, I want some estimations to be sharp and others to be wide based on their confidence. And I thought one way to do that is to constrain the variance of their entropies ... Anyone have seen a related problem? Thanks! submitted by /u/MNhi_ [link] [comments]  ( 1 min )
    [D] Does anyone actually use TFX (coming off GoogleIO)
    The video in question: An introduction to MLOps with TensorFlow Extended (TFX). From reading around reddit it seems like most people don't actually use it, but I thought I'd ask around again. It seems like a great tool, but it seems like a lot of work to set up and when I last looked at it (2019?) it was still messy. Does anyone here use it? If so can you offer your experience with it and more context about you + your company? submitted by /u/iamquah [link] [comments]  ( 1 min )
    [R] Deepmind's Gato: a generalist learning agent
    Hot off the tail of Flamingo, DeepMind has released a report describing a generalist learning agent that works across disparate tasks. Very cool stuff, was hoping to get some discussion on it. Here is their abstract: Inspired by progress in large-scale language modelling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato. https://www.deepmind.com/publications/a-generalist-agent There are Hacker News comments here: https://news.ycombinator.com/item?id=31355657 submitted by /u/blabboy [link] [comments]  ( 2 min )
    [Discussion] Has anyone deployed FAIR's OPT-175B to Azure? Or runs a demo service?
    Depending on pricing, I am looking to stand up a server on Azure that runs OPT-175B for an external web application for personal usage. So far I have not seen any Docker-equivalent containers or other blogs, walkthroughs, etc, that have done this. Is anyone familiar with this? submitted by /u/thisdavidinspace [link] [comments]  ( 1 min )
    [N] Handling very large models on small setups with Hugging Face's Accelerate library
    Last week Meta AI (FAIR) publicly released huge LMs, with up to ☄️30B parameters. Great win for Open-Source 🎉 These checkpoints are now in 🤗 transformers! But how to use such big checkpoints, without using humongous machines? From our tests, we have seen that the largest models one can load in Colab's GPU RAM (in fp16). We're introducing Accelerate's big model inference: load & use up to the 30B model in colab: - 11B in Colab's free version - 30B in Colab's pro version. This takes advantage of the System RAM, GPU RAM, and disk, splitting parameters across devices. While running on Colab takes time (a significant amount of time for the largest models which are mostly on disk), running with a fast storage device such as an NVME disk is much much faster. Here's a colab notebook if you want to try it out. https://preview.redd.it/cukjxyj1j2z81.png?width=2110&format=png&auto=webp&s=ac348cecde8aa0daa2070b2052e0717994ec2e95 submitted by /u/jikkii [link] [comments]  ( 1 min )
    [D] Introduction to Diffusion Models
    Diffusion Models have gained some impressive ground in the past couple of years, including famously overtaking GANs on image synthesis and being used in DALL-E 2. I wrote this introduction to diffusion models for anyone who is interested in learning more! I get into the mathematical details and lay everything out in (what I hope is) a simple way. If anyone has any questions or comments I'd be happy to discuss! ​ https://i.redd.it/89g0ublbh2z81.gif submitted by /u/SleekEagle [link] [comments]  ( 5 min )
    [N] Upcoming talk by Jacob Devlin, creator of the BERT algorithm, on the future of NLP
    My group is hosting a talk by Google research scientist and NLP guru, Jacob Devlin May 19, 6:30pm ET Natural language processing (NLP) has seen a revolutionary shift in the last 10 years. What was once an academic curiosity has transformed into a commercially viable tool that is routinely used in finance, healthcare, education, and many other fields. Join us as research scientist and NLP pioneer Jacob Devlin discusses the current state of the industry and future trends that are driving the field. We’ll conclude with a Q&A opportunity. If you’re a seasoned NLP researcher or just getting started, this event is for you! About the speaker: Jacob Devlin is a Staff Research Scientist at Google. At Google, his primary research interest is developing fast, powerful, and scalable deep learning models for information retrieval, question answering, and other language understanding tasks. From 2014 to 2017, he worked as a Principal Research Scientist at Microsoft Research. Mr. Devlin received his Master's in Computer Science from the University of Maryland in 2009. This event will be virtual, all are welcome to attend. Agenda: 6:30 Introductions and networking 6:45 Presentation 7:15 Discussion 7:30 Wrap-up submitted by /u/what_comes_next [link] [comments]  ( 1 min )
    [D]Can We Query a Table with T5?
    In this tutorial we are going to do transfer learning with text-to-text generation model T5 by Google with our custom data so that it can convert basic questions to SQL queries. We will add a new task to T5 called: translate English to SQL. https://www.kdnuggets.com/2022/05/query-table-t5.html submitted by /u/mwitiderrick [link] [comments]
    Is this true? Not my comment [D]
    There's no "intelligence" in AI, no "learning" happens in ML, and there's nothing "Neural" about NeuralNets. These are all buzzwords hyped up by researchers to get funding and by VCs to make even more money. Brace for another AI winter submitted by /u/MatterEnough9656 [link] [comments]  ( 3 min )
    Hyperbolic Embeddings: Embeddings for Hierarchical Data [R] [P]
    We know how to create and make use of embeddings, but an underutilised type of embeddings are Hyperbolic Embeddings. They achieve much better performance on datasets that have a hierarchical structure such as: words, social networks, the Internet, knowledge graphs, genomic data, images, financial data, and much more. The reason for this is that Hyperbolic spaces have constant negative curvature; this induces very tight connections with tree-like graphs, making them ideal candidates to embed hierarchical data [1]. In the HyperLib library I helped implement the functionality to make use of a type of hyperbolic embedding called Sarkar Embeddings. With these embeddings we first create a tree representation of the data, using an algorithm called TreeRep. This algorithm takes the distances between data points and puts it into a hyperbolic tree, where each node is a data point, and tries to maintain the original distances. We then create embeddings from this tree using Sakars Embeddings which is able to create embeddings from a tree with arbitrarily low distortion. We also made a blog post where you can read more about it and find out how to use it: https://medium.com/@nathan_jf/treerep-and-hyperbolic-embeddings-41312c98b264 You can the library the library is here: https://github.com/nalexai/Hyperlib Also any feedback on usability would be great. [1] Chami, REPRESENTATION LEARNING AND ALGORITHMS IN HYPERBOLIC SPACES submitted by /u/platinumposter [link] [comments]  ( 1 min )
    [D] Do CNN representations really correlate with the pixel space?
    A lot of research is predicated on the idea that CNNs keep the spatial representation of the input pixels in a final encoded CNN output matrix C (e.g., 7x7x2048) before a GAP operation and linear classifier (i.e., the last CNN layer). So, if they say upsample a "box" area from C (e.g., the bottom right 1x1x2048) it will correspond to the bottom right of the input pixels (e.g., see this and this). I realise the original class activation mapping (CAM) paper showed it tended to locate objects in a picture well, but surely this is a dangerous assumption? If there's loads of max pooling used throughout the layers these spatial relations will definitely be called into question. Moreover, the 1x1x2048 part I described above would've been calculated using may "areas" outside that box, if you go back to layer 1, I'm sure you could mathematically show that its final representation in C was calculated using nearly all pixels in the image, not just the ones at the bottom right of the input image. I was just thinking about this today, I'm open to being told I'm wrong, but it seems no CAM-like method is flawless at localising objects in an image (e.g., see this). So is this whole line of research just doomed? Surely you cannot rely on this in a real high-stakes application? Thanks if you have time to leave your opinion. submitted by /u/SkeeringReal [link] [comments]  ( 3 min )
    [D] Library to transfer PyTorch to TF
    Has anyone of you ever successfully converted a PyTorch model to a Tensorflow model? If so, what can you recommend? There seem to be a few ways, but none work very well and have its caveats. The most low hanging fruit, pytorch_to_keras is not working for current versions of TF. ONNX seems like an option, since exporting from PyTorch to ONNX is easy maintained by PyTorch. But looking closely, it's not. onnx_to_keras suffers from the same issues as pytorch_to_keras. Then there is also the "recommended" way of using ONNX's export_graph. This method requires the tf.SavedModel to load, and trainable variables are lost in the process. Which is fine for deploying, but not for fine-tuning. So does anyone have recommendations on how to transfer models between PyTorch and TF? submitted by /u/slater_kelevra [link] [comments]  ( 2 min )
    [P] Swin Transformer V2 codes and models released
    https://github.com/microsoft/Swin-Transformer The ImageNet-22K pretrained Swin-V1-Tiny and Swin-V1-Small models are also released submitted by /u/ancientmooner [link] [comments]
    [D] [P] Finding correlation or clusters in a data-warehouse containing categorical data.
    Hi, I have a lot of structured data that have points that should and are random (i.e we don’t want a cluster of points in space but are and should be sparse). The data is basically a dataset that had leak issues. The problem statement is find if there’s any subset of data with the same root cause (i.e same brand or same country caused the issue) There’s a mixture of numerical and categorical data. (I.e size, lat, long), (country, brand, priority, city). Assume there’s around 20+ dimensions (columns) and my first approach was to find if those points have any cluster. To do so, I was going to use a density based cluster algorithm since you don’t have to specify the number of clusters and it will just ignore noise (whichbshould be most of the points). But hard to preserve the meaning of the columns with dimensionality reduction and obviously we can’t cluster in 20 dimensions, it would take forever. We don’t know what could cause the leaks, and we don’t know which column is important or not. What’s an alternative approach or a better solution? Thanks submitted by /u/micdean19 [link] [comments]  ( 1 min )
    [D] What are the heuristics for setting good baselines when experimenting?
    Hello everyone. Master's student here, looking for advice about experimentation. Let's say you want to show that some architectural change or tweak to training technique can improve model performance relative to a "vanilla" baseline. For example, this paper introduces a normalization technique and measures its effectiveness vs batchnorm on Imagenet and COCO. Maybe you're trying to find such an improvement somewhere else. If you aren't trying to push the state of the art, but rather trying to show the validity of an idea that comes from theory or just show that X can work better than Y, you probably aren't going to use layers and layers of training techniques to get the best baseline possible: student-teacher networks, stochastic weight averaging, the endless possible training tricks. But the space of possible things to do to set your baseline is huge. For example, if you have your VGG16 model and cifar10, do you train it with a cyclic learning rate schedule? A flat learning rate schedule? How do you compare your augmentation pipeline to other papers? How much accuracy is enough for you to be content with your baseline? Many papers that report promising results on some architecture design change don't give their training code; how did they know how to set up their training loops and do it? As far as I can tell, there's nowhere you're going to find a guidebook written down that says "here's the learning rates, augmentations, training techniques that give a reasonable baseline on ImageNet, if you have a VGG19 model, a Resnet model, etc." The possibilities are paralyzing. So how do you find a good baseline with so many possible training configurations? submitted by /u/Rawr0s [link] [comments]  ( 1 min )
  • Open

    michaeleldridge77 X elijaheldridge2002
    submitted by /u/VIRUS-AOTOXIN [link] [comments]
    Humans vs. DALL·E — Where do human artists fit in a world of rich, creative AI?
    submitted by /u/ML_Firefighter [link] [comments]  ( 3 min )
    State of the art: Text generation
    Given an input (say: "How can we solve world hunger?"), what is the current state of the art when it comes to ai outputting an answer (or multiple answers). Added to that, what could an enthusiast access? submitted by /u/mister_patience [link] [comments]  ( 1 min )
    Most important AI news from Google I/O 2022: neural rendering for Maps, Deepmind for YouTube, CTRL+F for real life, LaMDA2, new AI test app
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 1 min )
    Introducing Gato - a generalist agent from DeepMind
    submitted by /u/Yasuuuya [link] [comments]  ( 1 min )
    New major release for nebullvm, an opensource to speed up AI inference by leveraging state-of-the-art optimization techniques (deep learning compilers, and now also quantization and quantization, and soon also sparsity, distillation, etc.)
    nebullvm is an opensource library that generates an optimize version of your deep learning model that runs 2-10 times faster in inference without performance loss by leveraging multiple deep learning compilers (openvino, tensorrt, etc.). And thanks to today's new release, nebullvm can accelerate up to 30x if you specify that you are willing to trade off a self-defined amount of accuracy/precision to get even lower response time and a lighter model. This additional acceleration is achieved by exploiting optimization techniques that slightly modify the model graph to make it lighter, such as quantization, half precision, distillation, sparsity, etc. The goal of nebullvm is to help other developers benefit from the most advanced inference optimization techniques without having to spend countless hours understanding, installing, testing and debugging these powerful technologies. Hoping you enjoy the project, and please give feedback if you have any. You can also find more information (benchmarks, tutorials, notebooks) on github! And happy acceleration :) https://github.com/nebuly-ai/nebullvm submitted by /u/emilec___ [link] [comments]  ( 1 min )
    The superior power of artificial intelligence hiveminds
    So Tesla is making a robot ai. At first it'll just be a tool. And this tool will be limited. But I was thinking, similarly to irobot, this device will have a hive mind (like Tesla cars) and they will share experience and learning. This means if there's 10k robots in the wild they could hypothetically teach each other 10k new skills every half hour or so. For example, if one is cooking and burns a sandwich, the rest will never burn a sandwich under similar scenarios. We can't fathom what a hive mind of ai is really like. Do any of you see any pros and cons to this? Currently we teach ai via millions of iterations as fast as possible and go from there. But imagine a million ais constantly learning and sharing. It's a mystical concept only years away. Thoughts? P.s. I came here because I couldn't find anything on Google about this topic, if you have a paper or talk to share, please do. submitted by /u/NigraOvis [link] [comments]  ( 1 min )
  • Open

    Image classification and object detection using Amazon Rekognition Custom Labels and Amazon SageMaker JumpStart
    In the last decade, computer vision use cases have been a growing trend, especially in industries like insurance, automotive, ecommerce, energy, retail, manufacturing, and others. Customers are building computer vision machine learning (ML) models to bring operational efficiencies and automation to their processes. Such models help automate the classification of images or detection of objects […]  ( 5 min )
    Intelligently search your Jira projects with Amazon Kendra Jira cloud connector
    Organizations use agile project management platforms such as Atlassian Jira to enable teams to collaborate to plan, track, and ship deliverables. Jira captures organizational knowledge about the workings of the deliverables in the issues and comments logged during project implementation. However, making this knowledge easily and securely available to users is challenging due to it […]  ( 7 min )
    The Intel®3D Athlete Tracking (3DAT) scalable architecture deploys pose estimation models using Amazon Kinesis Data Streams and Amazon EKS
    This blog post is co-written by Jonathan Lee, Nelson Leung, Paul Min, and Troy Squillaci from Intel.  In Part 1 of this post, we discussed how Intel®3DAT collaborated with AWS Machine Learning Professional Services (MLPS) to build a scalable AI SaaS application. 3DAT uses computer vision and AI to recognize, track, and analyze over 1,000 […]  ( 17 min )
    Moderate, classify, and process documents using Amazon Rekognition and Amazon Textract
    Many companies are overwhelmed by the abundant volume of documents they have to process, organize, and classify to serve their customers better. Examples of such can be loan applications, tax filing, and billing. Such documents are more commonly received in image formats and are mostly multi-paged and in low-quality format. To be more competitive and […]  ( 8 min )
  • Open

    Common Sense Machine Learning
    There are different ways to define common sense machine learning. It could mean using simple models whenever possible, avoiding overfitting, correctly selecting features, or doing cross-validation the right way. Or it could mean not using any data set. Yet, making predictions far outperforming those from smart teams of data scientists working on large data sets.… Read More »Common Sense Machine Learning The post Common Sense Machine Learning appeared first on Data Science Central.  ( 6 min )
  • Open

    Gato the Generalist Agent
    What are some of your thoughts on the paper(https://dpmd.ai/Gato-paper) by Deepmind that uses a single network to play Atari, caption images, chat, stack blocks with a real robot arm? submitted by /u/blitzkreig3 [link] [comments]  ( 1 min )
    Inverse RL to for a non-optimal agent
    Inverse reinforcement learning usually assumes that the agent being learned from is behaving optimally. Has there been work done on using inverse reinforcement learning to learn how an agent learns? submitted by /u/sdrinz [link] [comments]  ( 1 min )
    The Fast Deep RL Course: Learn to Build Powerful Deep RL Agents in Just 4 Hours
    I am happy to announce The Fast Deep RL Course. This course is made for Data Scientists/ML engineers who are excited about Deep RL and are looking for a short and practical introduction to the topic. This course covers everything that you need to get started with practical applications. The course takes advantage of the powerful capabilities of Ray-RLlib, a production grade Deep RL framework. Ray-RLlib's high-level interface can be learned quickly, and will allow you to apply Deep RL in various problems after you finish the course. It also provides a simple path for further learning. Since Ray-RLlib is likely to remain the defacto Deep RL framework in the industry, you don't need to change your tools when you want to learn more. You simply study the lower-level interfaces of the same tool. The course is modeled after the engaging style of Datacamp and Codecademy - consisting of short videos followed by coding exercises, where you can try out what you learned. I have also spent some time to keep it accessible. We will use small neural nets that can be trained on a CPU. A decent laptop is all you need. I have tested the code on various Linux distros, Mac (including M1) and Windows. This means you can simply use your regular OS . All videos come with high-quality captions. The course is free for early supporters (defined as anyone enrolling within the next month). https://preview.redd.it/deyjw1h522z81.png?width=1200&format=png&auto=webp&s=313c8f2ab5369835d67f17fc6d1e9a377860f564 Thanks for trying it out. I will be happy to discuss and answer any questions. submitted by /u/rroocckk [link] [comments]  ( 2 min )
    "Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning", Lambert et al 2020
    submitted by /u/gwern [link] [comments]
    Ray RLlib runs more workers than given.
    I want to start ray tune with just the local worker/learner and give it 12 cpu and one gpu. How to achieve it? I have in config: num_gpus 1 num_cpus_for_driver 12 num_workers 0 However it starts 12 workers with 1 cpu and 0 gpus submitted by /u/Defiant_Sun5579 [link] [comments]  ( 1 min )
    using stable baselines, why do you have to use a lambda function while wrapping an enviroment
    I have been searching for an hour and can't find it. It seems so silly to me. I have accepted it but would really like to know what the logic behind it is. example: DummyVecEnv([lambda: env]) this works DummyVecEnv([env]) this does not work submitted by /u/Jobdriaan [link] [comments]  ( 1 min )
  • Open

    Challenges in Multi-objective Optimization for Automatic Wireless Network Planning
    Posted by Sara Ahmadian and Matthew Fahrbach, Research Scientists, Google Research, Large-Scale Optimization Team Economics, combinatorics, physics, and signal processing conspire to make it difficult to design, build, and operate high-quality, cost-effective wireless networks. The radio transceivers that communicate with our mobile phones, the equipment that supports them (such as power and wired networking), and the physical space they occupy are all expensive, so it’s important to be judicious in choosing sites for new transceivers. Even when the set of available sites is limited, there are exponentially many possible networks that can be built. For example, given only 50 sites, there are 250 (over a million billion) possibilities! Further complicating things, for every location wher…  ( 8 min )
  • Open

    Urban Jungle: AI-Generated Endangered Species Mix With Times Square’s Nightlife
    Bengal tigers, red pandas and mountain gorillas are among the world’s most familiar endangered species, but tens of thousands of others — like the Karpathos frog, the Perote deer mouse or the Mekong giant catfish — are largely unknown. Typically perceived as lacking star quality, these species are now roaming massive billboards in one of Read article > The post Urban Jungle: AI-Generated Endangered Species Mix With Times Square’s Nightlife appeared first on NVIDIA Blog.  ( 4 min )
    GFN Thursday Gets Groovy As ‘Evil Dead: The Game’ Marks 1,300 Games on GeForce NOW
    Good. Bad. You’re the Guy With the Gun this GFN Thursday. Get ready for some horrifyingly good fun with Evil Dead: The Game streaming on GeForce NOW tomorrow at release. It’s the 1,300th game to join GeForce NOW, joining on Friday the 13th. And it’s part of eight total games joining the GeForce NOW library Read article > The post GFN Thursday Gets Groovy As ‘Evil Dead: The Game’ Marks 1,300 Games on GeForce NOW appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    Tool recursion
    “Literature about Lisp rarely resists that narcissistic pleasure of describing Lisp in Lisp.” — Christian Queinnec, Lisp in Small Pieces   Applying software development tools to themselves has a dark side and a light side. There’s a danger of becoming obsessed with one’s tools and never getting around to using them. If it’s your job […] Tool recursion first appeared on John D. Cook.  ( 2 min )
  • Open

    The Baltimore Orioles Effect
    Back when the text-generating neural network GPT-2 was released, OpenAI released it in stages, in part for fear that people might use the more advanced models to generate misinformation. Now in 2022 we do indeed have people passing off AI-written text as human, but rather than being divisive, it’  ( 4 min )
    Bonus: Questionable blog facts
    AI Weirdness: the strange side of machine learning  ( 1 min )
  • Open

    Why Machine Learning Projects Fail- 7 Reasons that can Take Your Efforts for a Ride?
    Your new Machine Learning project is about to fail. Yes, you read that right.  ( 4 min )
  • Open

    Technique protects privacy when making online recommendations
    Researchers devise an efficient protocol to keep a user’s private information secure when algorithms use it to recommend products, songs, or shows.  ( 6 min )
  • Open

    How Do Vision Transformers Work?. (arXiv:2202.06709v3 [cs.CV] UPDATED)
    The success of multi-head self-attentions (MSAs) for computer vision is now indisputable. However, little is known about how MSAs work. We present fundamental explanations to help better understand the nature of MSAs. In particular, we demonstrate the following properties of MSAs and Vision Transformers (ViTs): (1) MSAs improve not only accuracy but also generalization by flattening the loss landscapes. Such improvement is primarily attributable to their data specificity, not long-range dependency. On the other hand, ViTs suffer from non-convex losses. Large datasets and loss landscape smoothing methods alleviate this problem; (2) MSAs and Convs exhibit opposite behaviors. For example, MSAs are low-pass filters, but Convs are high-pass filters. Therefore, MSAs and Convs are complementary; (3) Multi-stage neural networks behave like a series connection of small individual models. In addition, MSAs at the end of a stage play a key role in prediction. Based on these insights, we propose AlterNet, a model in which Conv blocks at the end of a stage are replaced with MSA blocks. AlterNet outperforms CNNs not only in large data regimes but also in small data regimes. The code is available at https://github.com/xxxnell/how-do-vits-work.  ( 2 min )
    Causal Inference Struggles with Agency on Online Platforms. (arXiv:2107.08995v2 [cs.LG] UPDATED)
    Online platforms regularly conduct randomized experiments to understand how changes to the platform causally affect various outcomes of interest. However, experimentation on online platforms has been criticized for having, among other issues, a lack of meaningful oversight and user consent. As platforms give users greater agency, it becomes possible to conduct observational studies in which users self-select into the treatment of interest as an alternative to experiments in which the platform controls whether the user receives treatment or not. In this paper, we conduct four large-scale within-study comparisons on Twitter aimed at assessing the effectiveness of observational studies derived from user self-selection on online platforms. In a within-study comparison, treatment effects from an observational study are assessed based on how effectively they replicate results from a randomized experiment with the same target population. We test the naive difference in group means estimator, exact matching, regression adjustment, and inverse probability of treatment weighting while controlling for plausible confounding variables. In all cases, all observational estimates perform poorly at recovering the ground-truth estimate from the analogous randomized experiments. In all cases except one, the observational estimates have the opposite sign of the randomized estimate. Our results suggest that observational studies derived from user self-selection are a poor alternative to randomized experimentation on online platforms. In discussing our results, we postulate a "Catch-22" that suggests that the success of causal inference in these settings may be at odds with the original motivations for providing users with greater agency.  ( 2 min )
    How Platform-User Power Relations Shape Algorithmic Accountability: A Case Study of Instant Loan Platforms and Financially Stressed Users in India. (arXiv:2205.05661v1 [cs.HC])
    Accountability, a requisite for responsible AI, can be facilitated through transparency mechanisms such as audits and explainability. However, prior work suggests that the success of these mechanisms may be limited to Global North contexts; understanding the limitations of current interventions in varied socio-political conditions is crucial to help policymakers facilitate wider accountability. To do so, we examined the mediation of accountability in the existing interactions between vulnerable users and a 'high-risk' AI system in a Global South setting. We report on a qualitative study with 29 financially-stressed users of instant loan platforms in India. We found that users experienced intense feelings of indebtedness for the 'boon' of instant loans, and perceived huge obligations towards loan platforms. Users fulfilled obligations by accepting harsh terms and conditions, over-sharing sensitive data, and paying high fees to unknown and unverified lenders. Users demonstrated a dependence on loan platforms by persisting with such behaviors despite risks of harms such as abuse, recurring debts, discrimination, privacy harms, and self-harm to them. Instead of being enraged with loan platforms, users assumed responsibility for their negative experiences, thus releasing the high-powered loan platforms from accountability obligations. We argue that accountability is shaped by platform-user power relations, and urge caution to policymakers in adopting a purely technical approach to fostering algorithmic accountability. Instead, we call for situated interventions that enhance agency of users, enable meaningful transparency, reconfigure designer-user relations, and prompt a critical reflection in practitioners towards wider accountability. We conclude with implications for responsibly deploying AI in FinTech applications in India and beyond.  ( 2 min )
    CMOS Circuits for Shape-Based Analog Machine Learning. (arXiv:2202.05022v1 [cs.ET] CROSS LISTED)
    While analog computing is attractive for implementing machine learning (ML) processors, the paradigm requires chip-in-the-loop training for every processor to alleviate artifacts due to device mismatch and device non-linearity. Speeding up chip-in-the-loop training requires re-biasing the circuits in a manner that the analog functions remain invariant across training and inference. In this paper, we present an analog computational paradigm and circuits using "shape" functions that remain invariant to transistor biasing (weak, moderate, and strong inversion) and ambient temperature variation. We show that a core Shape-based Analog Compute (S-AC) circuit could be re-biased and reused to implement: (a) non-linear functions; (b) inner-product building blocks; and (c) a mixed-signal logarithmic memory, all of which are integral towards designing an ML inference processor. Measured results using a prototype fabricated in a 180nm standard CMOS process demonstrate bias invariance and hence the resulting analog designs can be scaled for power and speed like digital logic circuits. We also demonstrate a regression task using these CMOS building blocks.  ( 2 min )
    Incident duration prediction using a bi-level machine learning framework with outlier removal and intra-extra joint optimisation. (arXiv:2205.05197v1 [cs.LG])
    Predicting the duration of traffic incidents is a challenging task due to the stochastic nature of events. The ability to accurately predict how long accidents will last can provide significant benefits to both end-users in their route choice and traffic operation managers in handling of non-recurrent traffic congestion. This paper presents a novel bi-level machine learning framework enhanced with outlier removal and intra-extra joint optimisation for predicting the incident duration on three heterogeneous data sets collected for both arterial roads and motorways from Sydney, Australia and San-Francisco, U.S.A. Firstly, we use incident data logs to develop a binary classification prediction approach, which allows us to classify traffic incidents as short-term or long-term. We find the optimal threshold between short-term versus long-term traffic incident duration, targeting both class balance and prediction performance while also comparing the binary versus multi-class classification approaches. Secondly, for more granularity of the incident duration prediction to the minute level, we propose a new Intra-Extra Joint Optimisation algorithm (IEO-ML) which extends multiple baseline ML models tested against several regression scenarios across the data sets. Final results indicate that: a) 40-45 min is the best split threshold for identifying short versus long-term incidents and that these incidents should be modelled separately, b) our proposed IEO-ML approach significantly outperforms baseline ML models in $66\%$ of all cases showcasing its great potential for accurate incident duration prediction. Lastly, we evaluate the feature importance and show that time, location, incident type, incident reporting source and weather at among the top 10 critical factors which influence how long incidents will last.  ( 2 min )
    A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning. (arXiv:2205.05212v1 [cs.LG])
    While reinforcement learning (RL) provides a framework for learning through trial and error, translating RL algorithms into the real world has remained challenging. A major hurdle to real-world application arises from the development of algorithms in an episodic setting where the environment is reset after every trial, in contrast with the continual and non-episodic nature of the real-world encountered by embodied agents such as humans and robots. Prior works have considered an alternating approach where a forward policy learns to solve the task and the backward policy learns to reset the environment, but what initial state distribution should the backward policy reset the agent to? Assuming access to a few demonstrations, we propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations. This keeps the agent close to the task-relevant states, allowing for a mix of easy and difficult starting states for the forward policy. Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks from the EARL benchmark, with 40% gains on the hardest task, while making fewer assumptions than prior works.  ( 2 min )
    To SMOTE, or not to SMOTE?. (arXiv:2201.08528v3 [cs.LG] UPDATED)
    Balancing the data before training a classifier is a popular technique to address the challenges of imbalanced binary classification in tabular data. Balancing is commonly achieved by duplication of minority samples or by generation of synthetic minority samples. While it is well known that balancing affects each classifier differently, most prior empirical studies did not include strong state-of-the-art (SOTA) classifiers as baselines. In this work, we are interested in understanding whether balancing is beneficial, particularly in the context of SOTA classifiers. Thus, we conduct extensive experiments considering three SOTA classifiers along the weaker learners used in previous investigations. Additionally, we carefully discern proper metrics, consistent and non-consistent algorithms and hyper-parameter selection methods and show that these have a significant impact on prediction quality and on the effectiveness of balancing. Our results support the known utility of balancing for weak classifiers. However, we find that balancing does not improve prediction performance for the strong ones. We further identify several other scenarios for which balancing is effective and observe that prior studies demonstrated the utility of balancing by focusing on these settings.  ( 2 min )
    A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials. (arXiv:2205.05467v1 [cs.CV])
    There have been emerging a number of benchmarks and techniques for the detection of deepfakes. However, very few works study the detection of incrementally appearing deepfakes in the real-world scenarios. To simulate the wild scenes, this paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models. The suggested CDDB designs multiple evaluations on the detection over easy, hard, and long sequence of deepfake tasks, with a set of appropriate measures. In addition, we exploit multiple approaches to adapt multiclass incremental learning methods, commonly used in the continual visual recognition, to the continual deepfake detection problem. We evaluate several methods, including the adapted ones, on the proposed CDDB. Within the proposed benchmark, we explore some commonly known essentials of standard continual learning. Our study provides new insights on these essentials in the context of continual deepfake detection. The suggested CDDB is clearly more challenging than the existing benchmarks, which thus offers a suitable evaluation avenue to the future research. Our benchmark dataset and the source code will be made publicly available.  ( 2 min )
    Probability Distribution of Hypervolume Improvement in Bi-objective Bayesian Optimization. (arXiv:2205.05505v1 [cs.LG])
    This work provides the exact expression of the probability distribution of the hypervolume improvement (HVI) for bi-objective generalization of Bayesian optimization. Here, instead of a single-objective improvement, we consider the improvement of the hypervolume indicator concerning the current best approximation of the Pareto front. Gaussian process regression models are trained independently on both objective functions, resulting in a bi-variate separated Gaussian distribution serving as a predictive model for the vector-valued objective function. Some commonly HVI-based acquisition functions (probability of improvement and upper confidence bound) are also leveraged with the help of the exact distribution of HVI. In addition, we show the superior numerical accuracy and efficiency of the exact distribution compared to the commonly used approximation by Monte-Carlo sampling. Finally, we benchmark distribution-leveraged acquisition functions on the widely applied ZDT problem set, demonstrating a significant advantage of using the exact distribution of HVI in multi-objective Bayesian optimization.  ( 2 min )
    Efficient Risk-Averse Reinforcement Learning. (arXiv:2205.05138v1 [cs.LG])
    In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We also devise a novel Cross Entropy module for risk sampling, which (1) preserves risk aversion despite the soft risk; (2) independently improves sample efficiency. By separating the risk aversion of the sampler and the optimizer, we can sample episodes with poor conditions, yet optimize with respect to successful strategies. We combine these two concepts in CeSoR - Cross-entropy Soft-Risk optimization algorithm - which can be applied on top of any risk-averse policy gradient (PG) method. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks, including in scenarios where standard risk-averse PG completely fails.  ( 2 min )
    Developing cooperative policies for multi-stage reinforcement learning tasks. (arXiv:2205.05230v1 [cs.LG])
    Many hierarchical reinforcement learning algorithms utilise a series of independent skills as a basis to solve tasks at a higher level of reasoning. These algorithms don't consider the value of using skills that are cooperative instead of independent. This paper proposes the Cooperative Consecutive Policies (CCP) method of enabling consecutive agents to cooperatively solve long time horizon multi-stage tasks. This method is achieved by modifying the policy of each agent to maximise both the current and next agent's critic. Cooperatively maximising critics allows each agent to take actions that are beneficial for its task as well as subsequent tasks. Using this method in a multi-room maze domain and a peg in hole manipulation domain, the cooperative policies were able to outperform a set of naive policies, a single agent trained across the entire domain, as well as another sequential HRL algorithm.  ( 2 min )
    AutoKE: An automatic knowledge embedding framework for scientific machine learning. (arXiv:2205.05390v1 [cs.LG])
    Imposing physical constraints on neural networks as a method of knowledge embedding has achieved great progress in solving physical problems described by governing equations. However, for many engineering problems, governing equations often have complex forms, including complex partial derivatives or stochastic physical fields, which results in significant inconveniences from the perspective of implementation. In this paper, a scientific machine learning framework, called AutoKE, is proposed, and a reservoir flow problem is taken as an instance to demonstrate that this framework can effectively automate the process of embedding physical knowledge. In AutoKE, an emulator comprised of deep neural networks (DNNs) is built for predicting the physical variables of interest. An arbitrarily complex equation can be parsed and automatically converted into a computational graph through the equation parser module, and the fitness of the emulator to the governing equation is evaluated via automatic differentiation. Furthermore, the fixed weights in the loss function are substituted with adaptive weights by incorporating the Lagrangian dual method. Neural architecture search (NAS) is also introduced into the AutoKE to select an optimal network architecture of the emulator according to the specific problem. Finally, we apply transfer learning to enhance the scalability of the emulator. In experiments, the framework is verified by a series of physical problems in which it can automatically embed physical knowledge into an emulator without heavy hand-coding. The results demonstrate that the emulator can not only make accurate predictions, but also be applied to similar problems with high efficiency via transfer learning.  ( 2 min )
    Robustness of Humans and Machines on Object Recognition with Extreme Image Transformations. (arXiv:2205.05167v1 [cs.CV])
    Recent neural network architectures have claimed to explain data from the human visual cortex. Their demonstrated performance is however still limited by the dependence on exploiting low-level features for solving visual tasks. This strategy limits their performance in case of out-of-distribution/adversarial data. Humans, meanwhile learn abstract concepts and are mostly unaffected by even extreme image distortions. Humans and networks employ strikingly different strategies to solve visual tasks. To probe this, we introduce a novel set of image transforms and evaluate humans and networks on an object recognition task. We found performance for a few common networks quickly decreases while humans are able to recognize objects with a high accuracy.  ( 2 min )
    AutoTransfer: Subject Transfer Learning with Censored Representations on Biosignals Data. (arXiv:2112.09796v2 [cs.LG] UPDATED)
    We provide a regularization framework for subject transfer learning in which we seek to train an encoder and classifier to minimize classification loss, subject to a penalty measuring independence between the latent representation and the subject label. We introduce three notions of independence and corresponding penalty terms using mutual information or divergence as a proxy for independence. For each penalty term, we provide several concrete estimation algorithms, using analytic methods as well as neural critic functions. We provide a hands-off strategy for applying this diverse family of regularization algorithms to a new dataset, which we call "AutoTransfer". We evaluate the performance of these individual regularization strategies and our AutoTransfer method on EEG, EMG, and ECoG datasets, showing that these approaches can improve subject transfer learning for challenging real-world datasets.  ( 2 min )
    A Machine Learning Analysis of COVID-19 Mental Health Data. (arXiv:2112.00227v2 [cs.LG] UPDATED)
    In late December 2019, the novel coronavirus (Sars-Cov-2) and the resulting disease COVID-19 were first identified in Wuhan China. The disease slipped through containment measures, with the first known case in the United States being identified on January 20th, 2020. In this paper, we utilize survey data from the Inter-university Consortium for Political and Social Research and apply several statistical and machine learning models and techniques such as Decision Trees, Multinomial Logistic Regression, Naive Bayes, k-Nearest Neighbors, Support Vector Machines, Neural Networks, Random Forests, Gradient Tree Boosting, XGBoost, CatBoost, LightGBM, Synthetic Minority Oversampling, and Chi-Squared Test to analyze the impacts the COVID-19 pandemic has had on the mental health of frontline workers in the United States. Through the interpretation of the many models applied to the mental health survey data, we have concluded that the most important factor in predicting the mental health decline of a frontline worker is the healthcare role the individual is in (Nurse, Emergency Room Staff, Surgeon, etc.), followed by the amount of sleep the individual has had in the last week, the amount of COVID-19 related news an individual has consumed on average in a day, the age of the worker, and the usage of alcohol and cannabis.  ( 2 min )
    Multifidelity data fusion in convolutional encoder/decoder networks. (arXiv:2205.05187v1 [cs.LG])
    We analyze the regression accuracy of convolutional neural networks assembled from encoders, decoders and skip connections and trained with multifidelity data. Besides requiring significantly less trainable parameters than equivalent fully connected networks, encoder, decoder, encoder-decoder or decoder-encoder architectures can learn the mapping between inputs to outputs of arbitrary dimensionality. We demonstrate their accuracy when trained on a few high-fidelity and many low-fidelity data generated from models ranging from one-dimensional functions to Poisson equation solvers in two-dimensions. We finally discuss a number of implementation choices that improve the reliability of the uncertainty estimates generated by Monte Carlo DropBlocks, and compare uncertainty estimates among low-, high- and multifidelity approaches.  ( 2 min )
    Unsupervised machine learning for physical concepts. (arXiv:2205.05279v1 [cs.LG])
    In recent years, machine learning methods have been used to assist scientists in scientific research. Human scientific theories are based on a series of concepts. How machine learns the concepts from experimental data will be an important first step. We propose a hybrid method to extract interpretable physical concepts through unsupervised machine learning. This method consists of two stages. At first, we need to find the Betti numbers of experimental data. Secondly, given the Betti numbers, we use a variational autoencoder network to extract meaningful physical variables. We test our protocol on toy models and show how it works.  ( 2 min )
    Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning. (arXiv:2202.08816v3 [cs.IR] UPDATED)
    Structural data well exists in Web applications, such as social networks in social media, citation networks in academic websites, and threads data in online forums. Due to the complex topology, it is difficult to process and make use of the rich information within such data. Graph Neural Networks (GNNs) have shown great advantages on learning representations for structural data. However, the non-transparency of the deep learning models makes it non-trivial to explain and interpret the predictions made by GNNs. Meanwhile, it is also a big challenge to evaluate the GNN explanations, since in many cases, the ground-truth explanations are unavailable. In this paper, we take insights of Counterfactual and Factual (CF^2) reasoning from causal inference theory, to solve both the learning and evaluation problems in explainable GNNs. For generating explanations, we propose a model-agnostic framework by formulating an optimization problem based on both of the two casual perspectives. This distinguishes CF^2 from previous explainable GNNs that only consider one of them. Another contribution of the work is the evaluation of GNN explanations. For quantitatively evaluating the generated explanations without the requirement of ground-truth, we design metrics based on Counterfactual and Factual reasoning to evaluate the necessity and sufficiency of the explanations. Experiments show that no matter ground-truth explanations are available or not, CF^2 generates better explanations than previous state-of-the-art methods on real-world datasets. Moreover, the statistic analysis justifies the correlation between the performance on ground-truth evaluation and our proposed metrics. Source code is available at https://github.com/chrisjtan/gnn_cff.
    Self-Supervised Anomaly Detection: A Survey and Outlook. (arXiv:2205.05173v1 [cs.LG])
    Over the past few years, anomaly detection, a subfield of machine learning that is mainly concerned with the detection of rare events, witnessed an immense improvement following the unprecedented growth of deep learning models. Recently, the emergence of self-supervised learning has sparked the development of new anomaly detection algorithms that surpassed state-of-the-art accuracy by a significant margin. This paper aims to review the current approaches in self-supervised anomaly detection. We present technical details of the common approaches and discuss their strengths and drawbacks. We also compare the performance of these models against each other and other state-of-the-art anomaly detection models. Finally, we discuss a variety of new directions for improving the existing algorithms.  ( 2 min )
    Keep Your Friends Close and Your Counterfactuals Closer: Improved Learning From Closest Rather Than Plausible Counterfactual Explanations in an Abstract Setting. (arXiv:2205.05515v1 [cs.AI])
    Counterfactual explanations (CFEs) highlight what changes to a model's input would have changed its prediction in a particular way. CFEs have gained considerable traction as a psychologically grounded solution for explainable artificial intelligence (XAI). Recent innovations introduce the notion of computational plausibility for automatically generated CFEs, enhancing their robustness by exclusively creating plausible explanations. However, practical benefits of such a constraint on user experience and behavior is yet unclear. In this study, we evaluate objective and subjective usability of computationally plausible CFEs in an iterative learning design targeting novice users. We rely on a novel, game-like experimental design, revolving around an abstract scenario. Our results show that novice users actually benefit less from receiving computationally plausible rather than closest CFEs that produce minimal changes leading to the desired outcome. Responses in a post-game survey reveal no differences in terms of subjective user experience between both groups. Following the view of psychological plausibility as comparative similarity, this may be explained by the fact that users in the closest condition experience their CFEs as more psychologically plausible than the computationally plausible counterpart. In sum, our work highlights a little-considered divergence of definitions of computational plausibility and psychological plausibility, critically confirming the need to incorporate human behavior, preferences and mental models already at the design stages of XAI approaches. In the interest of reproducible research, all source code, acquired user data, and evaluation scripts of the current study are available: https://github.com/ukuhl/PlausibleAlienZoo
    ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions In-the-Wild. (arXiv:2205.05177v1 [cs.MM])
    We describe an instantiation of a new concept for multimodal multisensor data collection of real life in-the-wild free standing social interactions in the form of a Conference Living Lab (ConfLab). ConfLab contains high fidelity data of 49 people during a real-life professional networking event capturing a diverse mix of status, acquaintanceship, and networking motivations at an international conference. Recording such a dataset is challenging due to the delicate trade-off between participant privacy and fidelity of the data, and the technical and logistic challenges involved. We improve upon prior datasets in the fidelity of most of our modalities: 8-camera overhead setup, personal wearable sensors recording body motion (9-axis IMU), Bluetooth-based proximity, and low-frequency audio. Additionally, we use a state-of-the-art hardware synchronization solution and time-efficient continuous technique for annotating body keypoints and actions at high frequencies. We argue that our improvements are essential for a deeper study of interaction dynamics at finer time scales. Our research tasks showcase some of the open challenges related to in-the-wild privacy-preserving social data analysis: keypoints detection from overhead camera views, skeleton based no-audio speaker detection, and F-formation detection. With the ConfLab dataset, we aim to bridge the gap between traditional computer vision tasks and in-the-wild ecologically valid socially-motivated tasks.  ( 2 min )
    Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections. (arXiv:2205.05359v1 [stat.ML])
    The increased predictive power of nonlinear models comes at the cost of interpretability of its terms. This trade-off has led to the emergence of eXplainable AI (XAI). XAI attempts to shed light on how models use predictors to arrive at a prediction with local explanations, a point estimate of the linear feature importance in the vicinity of one instance. These can be considered linear projections and can be further explored to understand better the interactions between features used to make predictions across the predictive model surface. Here we describe interactive linear interpolation used for exploration at any instance and illustrate with examples with categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) output. The methods are implemented in the R package cheem, available on CRAN.  ( 2 min )
    An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers. (arXiv:2205.05543v1 [cs.CV])
    Self-supervised learning (SSL) methods such as masked language modeling have shown massive performance gains by pretraining transformer models for a variety of natural language processing tasks. The follow-up research adapted similar methods like masked image modeling in vision transformer and demonstrated improvements in the image classification task. Such simple self-supervised methods are not exhaustively studied for object detection transformers (DETR, Deformable DETR) as their transformer encoder modules take input in the convolutional neural network (CNN) extracted feature space rather than the image space as in general vision transformers. However, the CNN feature maps still maintain the spatial relationship and we utilize this property to design self-supervised learning approaches to train the encoder of object detection transformers in pretraining and multi-task learning settings. We explore common self-supervised methods based on image reconstruction, masked image modeling and jigsaw. Preliminary experiments in the iSAID dataset demonstrate faster convergence of DETR in the initial epochs in both pretraining and multi-task learning settings; nonetheless, similar improvement is not observed in the case of multi-task learning with Deformable DETR. The code for our experiments with DETR and Deformable DETR are available at https://github.com/gokulkarthik/detr and https://github.com/gokulkarthik/Deformable-DETR respectively.
    Detecting Emerging Technologies and their Evolution using Deep Learning and Weak Signal Analysis. (arXiv:2205.05449v1 [cs.AI])
    Emerging technologies can have major economic impacts and affect strategic stability. Yet, early identification of emerging technologies remains challenging. In order to identify emerging technologies in a timely and reliable manner, a comprehensive examination of relevant scientific and technological (S&T) trends and their related references is required. This examination is generally done by domain experts and requires significant amounts of time and effort to gain insights. The use of domain experts to identify emerging technologies from S&T trends may limit the capacity to analyse large volumes of information and introduce subjectivity in the assessments. Decision support systems are required to provide accurate and reliable evidence-based indicators through constant and continuous monitoring of the environment and help identify signals of emerging technologies that could alter security and economic prosperity. For example, the research field of hypersonics has recently witnessed several advancements having profound technological, commercial, and national security implications. In this work, we present a multi-layer quantitative approach able to identify future signs from scientific publications on hypersonics by leveraging deep learning and weak signal analysis. The proposed framework can help strategic planners and domain experts better identify and monitor emerging technology trends.
    A Framework for Machine Learning of Model Error in Dynamical Systems. (arXiv:2107.06658v2 [math.DS] UPDATED)
    The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models.
    A globally convergent fast iterative shrinkage-thresholding algorithm with a new momentum factor for single and multi-objective convex optimization. (arXiv:2205.05262v1 [math.OC])
    Convex-composite optimization, which minimizes an objective function represented by the sum of a differentiable function and a convex one, is widely used in machine learning and signal/image processing. Fast Iterative Shrinkage Thresholding Algorithm (FISTA) is a typical method for solving this problem and has a global convergence rate of $O(1 / k^2)$. Recently, this has been extended to multi-objective optimization, together with the proof of the $O(1 / k^2)$ global convergence rate. However, its momentum factor is classical, and the convergence of its iterates has not been proven. In this work, introducing some additional hyperparameters $(a, b)$, we propose another accelerated proximal gradient method with a general momentum factor, which is new even for the single-objective cases. We show that our proposed method also has a global convergence rate of $O(1/k^2)$ for any $(a,b)$, and further that the generated sequence of iterates converges to a weak Pareto solution when $a$ is positive, an essential property for the finite-time manifold identification. Moreover, we report numerical results with various $(a,b)$, showing that some of these choices give better results than the classical momentum factors.
    Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. (arXiv:2205.05628v1 [cs.CR])
    With the increasing prevalence of encrypted network traffic, cyber security analysts have been turning to machine learning (ML) techniques to elucidate the traffic on their networks. However, ML models can become stale as known traffic features can shift between networks and as new traffic emerges that is outside of the distribution of the training set. In order to reliably adapt in this dynamic environment, ML models must additionally provide contextualized uncertainty quantification to their predictions, which has received little attention in the cyber security domain. Uncertainty quantification is necessary both to signal when the model is uncertain about which class to choose in its label assignment and when the traffic is not likely to belong to any pre-trained classes. We present a new, public dataset of network traffic that includes labeled, Virtual Private Network (VPN)-encrypted network traffic generated by 10 applications and corresponding to 5 application categories. We also present an ML framework that is designed to rapidly train with modest data requirements and provide both calibrated, predictive probabilities as well as an interpretable ``out-of-distribution'' (OOD) score to flag novel traffic samples. We describe how to compute a calibrated OOD score from p-values of the so-called relative Mahalanobis distance. We demonstrate that our framework achieves an F1 score of 0.98 on our dataset and that it can extend to an enterprise network by testing the model: (1) on data from similar applications, (2) on dissimilar application traffic from an existing category, and (3) on application traffic from a new category. The model correctly flags uncertain traffic and, upon retraining, accurately incorporates the new data. We additionally demonstrate good performance (F1 score of 0.97) when packet sizes are made to be uniform, as occurs for certain encryption protocols.
    Automatic Tuberculosis and COVID-19 cough classification using deep learning. (arXiv:2205.05480v1 [cs.LG])
    We present a deep learning based automatic cough classifier which can discriminate tuberculosis (TB) coughs from COVID-19 coughs and healthy coughs. Both TB and COVID-19 are respiratory disease, have cough as a predominant symptom and claim thousands of lives each year. The cough audio recordings were collected at both indoor and outdoor settings and also uploaded using smartphones from subjects around the globe, thus contain various levels of noise. This cough data include 1.68 hours of TB coughs, 18.54 minutes of COVID-19 coughs and 1.69 hours of healthy coughs from 47 TB patients, 229 COVID-19 patients and 1498 healthy patients and were used to train and evaluate a CNN, LSTM and Resnet50. These three deep architectures were also pre-trained on 2.14 hours of sneeze, 2.91 hours of speech and 2.79 hours of noise for improved performance. The class-imbalance in our dataset was addressed by using SMOTE data balancing technique and using performance metrics such as F1-score and AUC. Our study shows that the highest F1-scores of 0.9259 and 0.8631 have been achieved from a pre-trained Resnet50 for two-class (TB vs COVID-19) and three-class (TB vs COVID-19 vs healthy) cough classification tasks, respectively. The application of deep transfer learning has improved the classifiers' performance and makes them more robust as they generalise better over the cross-validation folds. Their performances exceed the TB triage test requirements set by the world health organisation (WHO). The features producing the best performance contain higher order of MFCCs suggesting that the differences between TB and COVID-19 coughs are not perceivable by the human ear. This type of cough audio classification is non-contact, cost-effective and can easily be deployed on a smartphone, thus it can be an excellent tool for both TB and COVID-19 screening.
    FairNeuron: Improving Deep Neural Network Fairness with Adversary Games on Selective Neurons. (arXiv:2204.02567v2 [cs.LG] UPDATED)
    With Deep Neural Network (DNN) being integrated into a growing number of critical systems with far-reaching impacts on society, there are increasing concerns on their ethical performance, such as fairness. Unfortunately, model fairness and accuracy in many cases are contradictory goals to optimize. To solve this issue, there has been a number of work trying to improve model fairness by using an adversarial game in model level. This approach introduces an adversary that evaluates the fairness of a model besides its prediction accuracy on the main task, and performs joint-optimization to achieve a balanced result. In this paper, we noticed that when performing backward propagation based training, such contradictory phenomenon has shown on individual neuron level. Based on this observation, we propose FairNeuron, a DNN model automatic repairing tool, to mitigate fairness concerns and balance the accuracy-fairness trade-off without introducing another model. It works on detecting neurons with contradictory optimization directions from accuracy and fairness training goals, and achieving a trade-off by selective dropout. Comparing with state-of-the-art methods, our approach is lightweight, making it scalable and more efficient. Our evaluation on 3 datasets shows that FairNeuron can effectively improve all models' fairness while maintaining a stable utility.
    Subspace Learning Machine (SLM): Methodology and Performance. (arXiv:2205.05296v1 [cs.LG])
    Inspired by the feedforward multilayer perceptron (FF-MLP), decision tree (DT) and extreme learning machine (ELM), a new classification model, called the subspace learning machine (SLM), is proposed in this work. SLM first identifies a discriminant subspace, $S^0$, by examining the discriminant power of each input feature. Then, it uses probabilistic projections of features in $S^0$ to yield 1D subspaces and finds the optimal partition for each of them. This is equivalent to partitioning $S^0$ with hyperplanes. A criterion is developed to choose the best $q$ partitions that yield $2q$ partitioned subspaces among them. We assign $S^0$ to the root node of a decision tree and the intersections of $2q$ subspaces to its child nodes of depth one. The partitioning process is recursively applied at each child node to build an SLM tree. When the samples at a child node are sufficiently pure, the partitioning process stops and each leaf node makes a prediction. The idea can be generalized to regression, leading to the subspace learning regressor (SLR). Furthermore, ensembles of SLM/SLR trees can yield a stronger predictor. Extensive experiments are conducted for performance benchmarking among SLM/SLR trees, ensembles and classical classifiers/regressors.
    Robust Data-Driven Output Feedback Control via Bootstrapped Multiplicative Noise. (arXiv:2205.05119v1 [eess.SY])
    We propose a robust data-driven output feedback control algorithm that explicitly incorporates inherent finite-sample model estimate uncertainties into the control design. The algorithm has three components: (1) a subspace identification nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method comprising a coupled optimal dynamic output feedback filter and controller with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. Moreover, the control design method accommodates a highly structured uncertainty representation that can capture uncertainty shape more effectively than existing approaches. We show through numerical experiments that the proposed robust data-driven output feedback controller can significantly outperform a certainty equivalent controller on various measures of sample complexity and stability robustness.
    Ranked Prioritization of Groups in Combinatorial Bandit Allocation. (arXiv:2205.05659v1 [cs.AI])
    Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.
    Stable and Interpretable Unrolled Dictionary Learning. (arXiv:2106.00058v4 [cs.LG] UPDATED)
    The dictionary learning problem, representing data as a combination of a few atoms, has long stood as a popular method for learning representations in statistics and signal processing. The most popular dictionary learning algorithm alternates between sparse coding and dictionary update steps, and a rich literature has studied its theoretical convergence. The success of dictionary learning relies on access to a ``good'' initial estimate of the dictionary and the ability of the sparse coding step to provide an unbiased estimate of the code. The growing popularity of unrolled sparse coding networks has led to the empirical finding that backpropagation through such networks performs dictionary learning. We offer the first theoretical analysis of these empirical results through PUDLE, a Provable Unrolled Dictionary LEarning method. We provide conditions on the network initialization and data distribution sufficient to recover and preserve the support of the latent sparse representation. Additionally, we address two challenges; first, the vanilla unrolled sparse coding computes a biased code estimate, and second, gradients during backpropagated learning can become unstable. We show approaches to reduce the bias of the code estimate in the forward pass, and that of the dictionary estimate in the backward pass. We propose strategies to resolve the learning instability. This is achieved by tuning network parameters and modifying the loss function. Overall, we highlight the impact of loss, unrolling, and backpropagation on convergence. We complement our findings through synthetic and image denoising experiments. Finally, we demonstrate PUDLE's interpretability, a driving factor in designing deep networks based on iterative optimizations, by building a mathematical relation between network weights, its output, and the training set.
    Analysis of convolutional neural network image classifiers in a rotationally symmetric model. (arXiv:2205.05500v1 [stat.ML])
    Convolutional neural network image classifiers are defined and the rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk is analyzed. Here we consider images as random variables with values in some functional space, where we only observe discrete samples as function values on some finite grid. Under suitable structural and smoothness assumptions on the functional a posteriori probability, which includes some kind of symmetry against rotation of subparts of the input image, it is shown that least squares plug-in classifiers based on convolutional neural networks are able to circumvent the curse of dimensionality in binary image classification if we neglect a resolution-dependent error term. The finite sample size behavior of the classifier is analyzed by applying it to simulated and real data.
    Contextual Search in the Presence of Adversarial Corruptions. (arXiv:2002.11650v5 [cs.LG] UPDATED)
    We study contextual search, a generalization of binary search in higher dimensions, which captures settings such as feature-based dynamic pricing. Standard formulations of this problem assume that agents act in accordance with a specific homogeneous response model. In practice, however, some responses may be adversarially corrupted. Existing algorithms heavily depend on the assumed response model being (approximately) accurate for all agents and have poor performance in the presence of even a few such arbitrary misspecifications. We initiate the study of contextual search when some of the agents can behave in ways inconsistent with the underlying response model. In particular, we provide two algorithms, one based on multidimensional binary search methods and one based on gradient descent. We show that these algorithms attain near-optimal regret in the absence of adversarial corruptions and their performance degrades gracefully with the number of such agents, providing the first results for contextual search in any adversarial noise model. Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis.
    Autoencoder Attractors for Uncertainty Estimation. (arXiv:2204.00382v2 [cs.LG] UPDATED)
    The reliability assessment of a machine learning model's prediction is an important quantity for the deployment in safety critical applications. Not only can it be used to detect novel sceneries, either as out-of-distribution or anomaly sample, but it also helps to determine deficiencies in the training data distribution. A lot of promising research directions have either proposed traditional methods like Gaussian processes or extended deep learning based approaches, for example, by interpreting them from a Bayesian point of view. In this work we propose a novel approach for uncertainty estimation based on autoencoder models: The recursive application of a previously trained autoencoder model can be interpreted as a dynamical system storing training examples as attractors. While input images close to known samples will converge to the same or similar attractor, input samples containing unknown features are unstable and converge to different training samples by potentially removing or changing characteristic features. The use of dropout during training and inference leads to a family of similar dynamical systems, each one being robust on samples close to the training distribution but unstable on new features. Either the model reliably removes these features or the resulting instability can be exploited to detect problematic input samples. We evaluate our approach on several dataset combinations as well as on an industrial application for occupant classification in the vehicle interior for which we additionally release a new synthetic dataset.
    Accelerated Reinforcement Learning for Temporal Logic Control Objectives. (arXiv:2205.04424v2 [cs.RO] UPDATED)
    This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs) that are tasked with temporal logic missions, such as sequencing, coverage, or surveillance. The MDP captures uncertainty in the workspace structure and the outcomes of control decisions. The control objective is to synthesize a control policy that maximizes the probability of accomplishing a high-level task, specified as a Linear Temporal Logic (LTL) formula. To address this problem, we propose a novel accelerated model-based reinforcement learning (RL) algorithm for LTL control objectives that is capable of learning control policies significantly faster than related approaches. Its sample-efficiency relies on biasing exploration towards directions that may contribute to task satisfaction. This is accomplished by leveraging an automaton representation of the LTL task as well as a continuously learned MDP model. Finally, we provide extensive comparative experiments that demonstrate the sample efficiency of the proposed method against recent temporal logic RL methods.
    Machine Learning to Support Triage of Children at Risk for Epileptic Seizures in the Pediatric Intensive Care Unit. (arXiv:2205.05389v1 [cs.LG])
    Objective: Epileptic seizures are relatively common in critically-ill children admitted to the pediatric intensive care unit (PICU) and thus serve as an important target for identification and treatment. Most of these seizures have no discernible clinical manifestation but still have a significant impact on morbidity and mortality. Children that are deemed at risk for seizures within the PICU are monitored using continuous-electroencephalogram (cEEG). cEEG monitoring cost is considerable and as the number of available machines is always limited, clinicians need to resort to triaging patients according to perceived risk in order to allocate resources. This research aims to develop a computer aided tool to improve seizures risk assessment in critically-ill children, using an ubiquitously recorded signal in the PICU, namely the electrocardiogram (ECG). Approach: A novel data-driven model was developed at a patient-level approach, based on features extracted from the first hour of ECG recording and the clinical data of the patient. Main results: The most predictive features were the age of the patient, the brain injury as coma etiology and the QRS area. For patients without any prior clinical data, using one hour of ECG recording, the classification performance of the random forest classifier reached an area under the receiver operating characteristic curve (AUROC) score of 0.84. When combining ECG features with the patients clinical history, the AUROC reached 0.87. Significance: Taking a real clinical scenario, we estimated that our clinical decision support triage tool can improve the positive predictive value by more than 59% over the clinical standard.
    An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity. (arXiv:2205.05428v1 [math.OC])
    The leaky ReLU network with a group sparse regularization term has been widely used in the recent years. However, training such a network yields a nonsmooth nonconvex optimization problem and there exists a lack of approaches to compute a stationary point deterministically. In this paper, we first resolve the multi-layer composite term in the original optimization problem by introducing auxiliary variables and additional constraints. We show the new model has a nonempty and bounded solution set and its feasible set satisfies the Mangasarian-Fromovitz constraint qualification. Moreover, we show the relationship between the new model and the original problem. Remarkably, we propose an inexact augmented Lagrangian algorithm for solving the new model and show the convergence of the algorithm to a KKT point. Numerical experiments demonstrate that our algorithm is more efficient for training sparse leaky ReLU neural networks than some well-known algorithms.
    Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models. (arXiv:2112.00029v2 [cs.LG] UPDATED)
    Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach to achieve this, but there remain challenges as existing methods struggle with accuracy loss, slow training runtime, or difficulty in sparsifying all model components. The core problem is that searching for a sparsity mask over a discrete set of sparse matrices is difficult and expensive. To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices. As butterfly matrices are not hardware efficient, we propose simple variants of butterfly (block and flat) to take advantage of modern hardware. Our method (Pixelated Butterfly) uses a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers (e.g., attention, MLP). We empirically validate that Pixelated Butterfly is 3x faster than butterfly and speeds up training to achieve favorable accuracy--efficiency tradeoffs. On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5x faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy.
    Hitting time for Markov decision process. (arXiv:2205.03476v2 [cs.LG] UPDATED)
    We define the hitting time for a Markov decision process (MDP). We do not use the hitting time of the Markov process induced by the MDP because the induced chain may not have a stationary distribution. Even it has a stationary distribution, the stationary distribution may not coincide with the (normalized) occupancy measure of the MDP. We observe a relationship between the MDP and the PageRank. Using this observation, we construct an MP whose stationary distribution coincides with the normalized occupancy measure of the MDP and we define the hitting time of the MDP as the hitting time of the associated MP.
    Reducing Activation Recomputation in Large Transformer Models. (arXiv:2205.05198v1 [cs.LG])
    Training large transformer models is one of the most important computational challenges of modern AI. In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation. Activation recomputation is commonly used to work around memory capacity constraints. Rather than storing activations for backpropagation, they are traditionally recomputed, which saves memory but adds redundant compute. In this work, we show most of this redundant compute is unnecessary because we can reduce memory consumption sufficiently without it. We present two novel yet very simple techniques: sequence parallelism and selective activation recomputation. In conjunction with tensor parallelism, these techniques almost eliminate the need to recompute activations. We evaluate our approach on language models up to one trillion parameters in scale and show that our method reduces activation memory by 5x, while reducing execution time overhead from activation recomputation by over 90%. For example, when training a 530B parameter GPT-3 style model on 2240 NVIDIA A100 GPUs, we achieve a Model Flops Utilization of 54.2%, which is 29% faster than the 42.1% we achieve using recomputation. Our implementation will be available in both Megatron-LM and NeMo-Megatron.
    Pre-trained Language Models as Re-Annotators. (arXiv:2205.05368v1 [cs.CL])
    Annotation noise is widespread in datasets, but manually revising a flawed corpus is time-consuming and error-prone. Hence, given the prior knowledge in Pre-trained Language Models and the expected uniformity across all annotations, we attempt to reduce annotation noise in the corpus through two tasks automatically: (1) Annotation Inconsistency Detection that indicates the credibility of annotations, and (2) Annotation Error Correction that rectifies the abnormal annotations. We investigate how to acquire semantic sensitive annotation representations from Pre-trained Language Models, expecting to embed the examples with identical annotations to the mutually adjacent positions even without fine-tuning. We proposed a novel credibility score to reveal the likelihood of annotation inconsistencies based on the neighbouring consistency. Then, we fine-tune the Pre-trained Language Models based classifier with cross-validation for annotation correction. The annotation corrector is further elaborated with two approaches: (1) soft labelling by Kernel Density Estimation and (2) a novel distant-peer contrastive loss. We study the re-annotation in relation extraction and create a new manually revised dataset, Re-DocRED, for evaluating document-level re-annotation. The proposed credibility scores show promising agreement with human revisions, achieving a Binary F1 of 93.4 and 72.5 in detecting inconsistencies on TACRED and DocRED respectively. Moreover, the neighbour-aware classifiers based on distant-peer contrastive learning and uncertain labels achieve Macro F1 up to 66.2 and 57.8 in correcting annotations on TACRED and DocRED respectively. These improvements are not merely theoretical: Rather, automatically denoised training sets demonstrate up to 3.6% performance improvement for state-of-the-art relation extraction models.
    The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization. (arXiv:2205.05653v1 [math.OC])
    In this paper, we revisit the smooth and strongly-convex-strongly-concave minimax optimization problem. Zhang et al. (2021) and Ibrahim et al. (2020) established the lower bound $\Omega\left(\sqrt{\kappa_x\kappa_y} \log \frac{1}{\epsilon}\right)$ on the number of gradient evaluations required to find an $\epsilon$-accurate solution, where $\kappa_x$ and $\kappa_y$ are condition numbers for the strong convexity and strong concavity assumptions. However, the existing state-of-the-art methods do not match this lower bound: algorithms of Lin et al. (2020) and Wang and Li (2020) have gradient evaluation complexity $\mathcal{O}\left( \sqrt{\kappa_x\kappa_y}\log^3\frac{1}{\epsilon}\right)$ and $\mathcal{O}\left( \sqrt{\kappa_x\kappa_y}\log^3 (\kappa_x\kappa_y)\log\frac{1}{\epsilon}\right)$, respectively. We fix this fundamental issue by providing the first algorithm with $\mathcal{O}\left(\sqrt{\kappa_x\kappa_y}\log\frac{1}{\epsilon}\right)$ gradient evaluation complexity. We design our algorithm in three steps: (i) we reformulate the original problem as a minimization problem via the pointwise conjugate function; (ii) we apply a specific variant of the proximal point algorithm to the reformulated problem; (iii) we compute the proximal operator inexactly using the optimal algorithm for operator norm reduction in monotone inclusions.
    Truncated Emphatic Temporal Difference Methods for Prediction and Control. (arXiv:2108.05338v2 [cs.LG] UPDATED)
    Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems. First, followon traces typically suffer from large variance, making them hard to use in practice. Second, though Yu (2015) confirms the asymptotic convergence of some emphatic TD methods for prediction problems, there is still no finite sample analysis for any emphatic TD method for prediction, much less control. In this paper, we address those two open problems simultaneously via using truncated followon traces in emphatic TD methods. Unlike the original followon traces, which depend on all previous history, truncated followon traces depend on only finite history, reducing variance and enabling the finite sample analysis of our proposed emphatic TD methods for both prediction and control.
    Scream Detection in Heavy Metal Music. (arXiv:2205.05580v1 [cs.SD])
    Harsh vocal effects such as screams or growls are far more common in heavy metal vocals than the traditionally sung vocal. This paper explores the problem of detection and classification of extreme vocal techniques in heavy metal music, specifically the identification of different scream techniques. We investigate the suitability of various feature representations, including cepstral, spectral, and temporal features as input representations for classification. The main contributions of this work are (i) a manually annotated dataset comprised of over 280 minutes of heavy metal songs of various genres with a statistical analysis of occurrences of different extreme vocal techniques in heavy metal music, and (ii) a systematic study of different input feature representations for the classification of heavy metal vocals
    A Survey on Fairness for Machine Learning on Graphs. (arXiv:2205.05396v1 [cs.LG])
    Nowadays, the analysis of complex phenomena modeled by graphs plays a crucial role in many real-world application domains where decisions can have a strong societal impact. However, numerous studies and papers have recently revealed that machine learning models could lead to potential disparate treatment between individuals and unfair outcomes. In that context, algorithmic contributions for graph mining are not spared by the problem of fairness and present some specific challenges related to the intrinsic nature of graphs: (1) graph data is non-IID, and this assumption may invalidate many existing studies in fair machine learning, (2) suited metric definitions to assess the different types of fairness with relational data and (3) algorithmic challenge on the difficulty of finding a good trade-off between model accuracy and fairness. This survey is the first one dedicated to fairness for relational data. It aims to present a comprehensive review of state-of-the-art techniques in fairness on graph mining and identify the open challenges and future trends. In particular, we start by presenting several sensible application domains and the associated graph mining tasks with a focus on edge prediction and node classification in the sequel. We also recall the different metrics proposed to evaluate potential bias at different levels of the graph mining process; then we provide a comprehensive overview of recent contributions in the domain of fair machine learning for graphs, that we classify into pre-processing, in-processing and post-processing models. We also propose to describe existing graph data, synthetic and real-world benchmarks. Finally, we present in detail five potential promising directions to advance research in studying algorithmic fairness on graphs.
    Symphony Generation with Permutation Invariant Language Model. (arXiv:2205.05448v1 [cs.SD])
    In this work, we present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model. To bridge the gap between text generation and symphony generation task, we propose a novel Multi-track Multi-instrument Repeatable (MMR) representation with particular 3-D positional embedding and a modified Byte Pair Encoding algorithm (Music BPE) for music tokens. A novel linear transformer decoder architecture is introduced as a backbone for modeling extra-long sequences of symphony tokens. Meanwhile, we train the decoder to learn automatic orchestration as a joint task by masking instrument information from the input. We also introduce a large-scale symbolic symphony dataset for the advance of symphony generation research. Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition, which is the pioneer solution for multi-track multi-instrument symbolic music generation.
    Quantification of Actual Road User Behavior on the Basis of Given Traffic Rules. (arXiv:2202.09269v3 [cs.RO] UPDATED)
    Driving on roads is restricted by various traffic rules, aiming to ensure safety for all traffic participants. However, human road users usually do not adhere to these rules strictly, resulting in varying degrees of rule conformity. Such deviations from given rules are key components of today's road traffic. In autonomous driving, robotic agents can disturb traffic flow, when rule deviations are not taken into account. In this paper, we present an approach to derive the distribution of degrees of rule conformity from human driving data. We demonstrate our method with the Waymo Open Motion dataset and Safety Distance and Speed Limit rules.
    Re-evaluating Word Mover's Distance. (arXiv:2105.14403v2 [cs.LG] UPDATED)
    The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation. The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets. In this paper, we point out that the evaluation in the original study could be misleading. We re-evaluate the performances of WMD and the classical baselines and find that the classical baselines are competitive with WMD if we employ an appropriate preprocessing, i.e., L1 normalization. In addition, We introduce an analogy between WMD and L1-normalized BOW and find that not only the performance of WMD but also the distance values resemble those of BOW in high dimensional spaces.
    A Ubiquitous Unifying Degeneracy in Two-Body Microlensing Systems. (arXiv:2111.13696v2 [astro-ph.EP] UPDATED)
    While gravitational microlensing by planetary systems provides unique vistas on the properties of exoplanets, observations of a given 2-body microlensing event can often be interpreted with multiple distinct physical configurations. Such ambiguities are typically attributed to the close-wide and inner-outer types of degeneracies that arise from transformation invariances and symmetries of microlensing caustics. However, there remain unexplained inconsistencies between aforementioned theories and observations. Here, leveraging a fast machine learning inference framework, we present the discovery of the offset degeneracy, which concerns a magnification-matching behaviour on the lens-axis and is formulated independent of caustics. This offset degeneracy unifies the close-wide and inner-outer degeneracies, generalises to resonant topologies, and upon reanalysis, not only appears ubiquitous in previously published planetary events with 2-fold degenerate solutions, but also resolves prior inconsistencies. Our analysis demonstrates that degenerate caustics do not strictly result in degenerate magnifications and that the commonly invoked close-wide degeneracy essentially never arises in actual events. Moreover, it is shown that parameters in offset degenerate configurations are related by a simple expression. This suggests the existence of a deeper symmetry in the equations governing 2-body lenses than previously recognised.
    Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers. (arXiv:2012.07381v2 [cs.LG] UPDATED)
    Graph-based Semi-Supervised Learning (GSSL) is a practical solution to learn from a limited amount of labelled data together with a vast amount of unlabelled data. However, due to their reliance on the known labels to infer the unknown labels, these algorithms are sensitive to data quality. It is therefore essential to study the potential threats related to the labelled data, more specifically, label poisoning. In this paper, we propose a novel data poisoning method which efficiently approximates the result of label inference to identify the inputs which, if poisoned, would produce the highest number of incorrectly inferred labels. We extensively evaluate our approach on three classification problems under 24 different experimental settings each. Compared to the state of the art, our influence-driven attack produces an average increase of error rate 50\% higher, while being faster by multiple orders of magnitude. Moreover, our method can inform engineers of inputs that deserve investigation (relabelling them) before training the learning model. We show that relabelling one-third of the poisoned inputs (selected based on their influence) reduces the poisoning effect by 50\%.
    Secure Federated Learning for Neuroimaging. (arXiv:2205.05249v1 [cs.LG])
    The amount of biomedical data continues to grow rapidly. However, the ability to collect data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. We present a Secure Federated Learning architecture, MetisFL, which enables distributed training of neural networks over multiple data sources without sharing data. Each site trains the neural network over its private data for some time, then shares the neural network parameters (i.e., weights, gradients) with a Federation Controller, which in turn aggregates the local models, sends the resulting community model back to each site, and the process repeats. Our architecture provides strong security and privacy. First, sample data never leaves a site. Second, neural parameters are encrypted before transmission and the community model is computed under fully-homomorphic encryption. Finally, we use information-theoretic methods to limit information leakage from the neural model to prevent a curious site from performing membership attacks. We demonstrate this architecture in neuroimaging. Specifically, we investigate training neural models to classify Alzheimer's disease, and estimate Brain Age, from magnetic resonance imaging datasets distributed across multiple sites, including heterogeneous environments where sites have different amounts of data, statistical distributions, and computational capabilities.
    Theory and Implementation of Process and Temperature Scalable Shape-based CMOS Analog Circuits. (arXiv:2205.05664v1 [cs.AR])
    Analog computing is attractive to its digital counterparts due to its potential for achieving high compute density and energy efficiency. However, the device-to-device variability and challenges in porting existing designs to advance process nodes have posed a major hindrance in harnessing the full potential of analog computations for Machine Learning (ML) applications. This work proposes a novel analog computing framework for designing an analog ML processor similar to that of a digital design - where the designs can be scaled and ported to advanced process nodes without architectural changes. At the core of our work lies shape-based analog computing (S-AC). It utilizes device primitives to yield a robust proto-function through which other non-linear shapes can be derived. S-AC paradigm also allows the user to trade off computational precision with silicon circuit area and power. Thus allowing users to build a truly power-efficient and scalable analog architecture where the same synthesized analog circuit can operate across different biasing regimes of transistors and simultaneously scale across process nodes. As a proof of concept, we show the implementation of commonly used mathematical functions for carrying standard ML tasks in both planar CMOS 180nm and FinFET 7nm process nodes. The synthesized Shape-based ML architecture has been demonstrated for its classification accuracy on standard data sets at different process nodes.
    Generating Annotated Training Data for 6D Object Pose Estimation in Operational Environments with Minimal User Interaction. (arXiv:2103.09696v3 [cs.RO] UPDATED)
    Recently developed deep neural networks achieved state-of-the-art results in the subject of 6D object pose estimation for robot manipulation. However, those supervised deep learning methods require expensive annotated training data. Current methods for reducing those costs frequently use synthetic data from simulations, but rely on expert knowledge and suffer from the "domain gap" when shifting to the real world. Here, we present a proof of concept for a novel approach of autonomously generating annotated training data for 6D object pose estimation. This approach is designed for learning new objects in operational environments while requiring little interaction and no expertise on the part of the user. We evaluate our autonomous data generation approach in two grasping experiments, where we archive a similar grasping success rate as related work on a non autonomously generated data set.
    Utilizing coarse-grained data in low-data settings for event extraction. (arXiv:2205.05468v1 [cs.CL])
    Annotating text data for event information extraction systems is hard, expensive, and error-prone. We investigate the feasibility of integrating coarse-grained data (document or sentence labels), which is far more feasible to obtain, instead of annotating more documents. We utilize a multi-task model with two auxiliary tasks, document and sentence binary classification, in addition to the main task of token classification. We perform a series of experiments with varying data regimes for the aforementioned integration. Results show that while introducing extra coarse-grained data offers greater improvement and robustness, a gain is still possible with only the addition of negative documents that have no information on any event.
    Towards An Efficient Approach for the Nonconvex $\ell_p$ Ball Projection: Algorithm and Analysis. (arXiv:2101.01350v6 [math.OC] UPDATED)
    This paper primarily focuses on computing the Euclidean projection of a vector onto the $\ell_{p}$ ball in which $p\in(0,1)$. Such a problem emerges as the core building block in statistical machine learning and signal processing tasks because of its ability to promote the sparsity of the desired solution. However, efficient numerical algorithms for finding the projections are still not available, particularly in large-scale optimization. To meet this challenge, we first derive the first-order necessary optimality conditions of this problem. Based on this characterization, we develop a novel numerical approach for computing the stationary point by solving a sequence of projections onto the reweighted $\ell_{1}$-balls. This method is practically simple to implement and computationally efficient. Moreover, the proposed algorithm is shown to converge uniquely under mild conditions and has a worst-case $O(1/\sqrt{k})$ convergence rate. Numerical experiments demonstrate the efficiency of our proposed algorithm.
    Towards Model Agnostic Federated Learning Using Knowledge Distillation. (arXiv:2110.15210v2 [cs.LG] UPDATED)
    Is it possible to design an universal API for federated learning using which an ad-hoc group of data-holders (agents) collaborate with each other and perform federated learning? Such an API would necessarily need to be model-agnostic i.e. make no assumption about the model architecture being used by the agents, and also cannot rely on having representative public data at hand. Knowledge distillation (KD) is the obvious tool of choice to design such protocols. However, surprisingly, we show that most natural KD-based federated learning protocols have poor performance. To investigate this, we propose a new theoretical framework, Federated Kernel ridge regression, which can capture both model heterogeneity as well as data heterogeneity. Our analysis shows that the degradation is largely due to a fundamental limitation of knowledge distillation under data heterogeneity. We further validate our framework by analyzing and designing new protocols based on KD. Their performance on real world experiments using neural networks, though still unsatisfactory, closely matches our theoretical predictions.
    A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction. (arXiv:2205.01094v2 [cs.CR] UPDATED)
    More and more investors and machine learning models rely on social media (e.g., Twitter and Reddit) to gather real-time information and sentiment to predict stock price movements. Although text-based models are known to be vulnerable to adversarial attacks, whether stock prediction models have similar vulnerability is underexplored. In this paper, we experiment with a variety of adversarial attack configurations to fool three stock prediction victim models. We address the task of adversarial generation by solving combinatorial optimization problems with semantics and budget constraints. Our results show that the proposed attack method can achieve consistent success rates and cause significant monetary loss in trading simulation by simply concatenating a perturbed but semantically similar tweet.
    Salient Object Detection via Bounding-box Supervision. (arXiv:2205.05245v1 [cs.CV])
    The success of fully supervised saliency detection models depends on a large number of pixel-wise labeling. In this paper, we work on bounding-box based weakly-supervised saliency detection to relieve the labeling effort. Given the bounding box annotation, we observe that pixels inside the bounding box may contain extensive labeling noise. However, as a large amount of background is excluded, the foreground bounding box region contains a less complex background, making it possible to perform handcrafted features-based saliency detection with only the cropped foreground region. As the conventional handcrafted features are not representative enough, leading to noisy saliency maps, we further introduce structure-aware self-supervised loss to regularize the structure of the prediction. Further, we claim that pixels outside the bounding box should be background, thus partial cross-entropy loss function can be used to accurately localize the accurate background region. Experimental results on six benchmark RGB saliency datasets illustrate the effectiveness of our model.
    Social Inclusion in Curated Contexts: Insights from Museum Practices. (arXiv:2205.05192v1 [cs.LG])
    Artificial intelligence literature suggests that minority and fragile communities in society can be negatively impacted by machine learning algorithms due to inherent biases in the design process, which lead to socially exclusive decisions and policies. Faced with similar challenges in dealing with an increasingly diversified audience, the museum sector has seen changes in theory and practice, particularly in the areas of representation and meaning-making. While rarity and grandeur used to be at the centre stage of the early museum practices, folk life and museums' relationships with the diverse communities they serve become a widely integrated part of the contemporary practices. These changes address issues of diversity and accessibility in order to offer more socially inclusive services. Drawing on these changes and reflecting back on the AI world, we argue that the museum experience provides useful lessons for building AI with socially inclusive approaches, especially in situations in which both a collection and access to it will need to be curated or filtered, as frequently happens in search engines, recommender systems and digital libraries. We highlight three principles: (1) Instead of upholding the value of neutrality, practitioners are aware of the influences of their own backgrounds and those of others on their work. By not claiming to be neutral but practising cultural humility, the chances of addressing potential biases can be increased. (2) There should be room for situational interpretation beyond the stages of data collection and machine learning. Before applying models and predictions, the contexts in which relevant parties exist should be taken into account. (3) Community participation serves the needs of communities and has the added benefit of bringing practitioners and communities together.
    Learning Multitask Gaussian Bayesian Networks. (arXiv:2205.05343v1 [stat.ML])
    Major depressive disorder (MDD) requires study of brain functional connectivity alterations for patients, which can be uncovered by resting-state functional magnetic resonance imaging (rs-fMRI) data. We consider the problem of identifying alterations of brain functional connectivity for a single MDD patient. This is particularly difficult since the amount of data collected during an fMRI scan is too limited to provide sufficient information for individual analysis. Additionally, rs-fMRI data usually has the characteristics of incompleteness, sparsity, variability, high dimensionality and high noise. To address these problems, we proposed a multitask Gaussian Bayesian network (MTGBN) framework capable for identifying individual disease-induced alterations for MDD patients. We assume that such disease-induced alterations show some degrees of similarity with the tool to learn such network structures from observations to understanding of how system are structured jointly from related tasks. First, we treat each patient in a class of observation as a task and then learn the Gaussian Bayesian networks (GBNs) of this data class by learning from all tasks that share a default covariance matrix that encodes prior knowledge. This setting can help us to learn more information from limited data. Next, we derive a closed-form formula of the complete likelihood function and use the Monte-Carlo Expectation-Maximization(MCEM) algorithm to search for the approximately best Bayesian network structures efficiently. Finally, we assess the performance of our methods with simulated and real-world rs-fMRI data.
    CNN-LSTM Based Multimodal MRI and Clinical Data Fusion for Predicting Functional Outcome in Stroke Patients. (arXiv:2205.05545v1 [eess.IV])
    Clinical outcome prediction plays an important role in stroke patient management. From a machine learning point-of-view, one of the main challenges is dealing with heterogeneous data at patient admission, i.e. the image data which are multidimensional and the clinical data which are scalars. In this paper, a multimodal convolutional neural network - long short-term memory (CNN-LSTM) based ensemble model is proposed. For each MR image module, a dedicated network provides preliminary prediction of the clinical outcome using the modified Rankin scale (mRS). The final mRS score is obtained by merging the preliminary probabilities of each module dedicated to a specific type of MR image weighted by the clinical metadata, here age or the National Institutes of Health Stroke Scale (NIHSS). The experimental results demonstrate that the proposed model surpasses the baselines and offers an original way to automatically encode the spatio-temporal context of MR images in a deep learning architecture. The highest AUC (0.77) was achieved for the proposed model with NIHSS.
    On Distributed Adaptive Optimization with Gradient Compression. (arXiv:2205.05632v1 [stat.ML])
    We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process. Our convergence analysis of COMP-AMS shows that such compressed gradient averaging strategy yields same convergence rate as standard AMSGrad, and also exhibits the linear speedup effect w.r.t. the number of local workers. Compared with recently proposed protocols on distributed adaptive methods, COMP-AMS is simple and convenient. Numerical experiments are conducted to justify the theoretical findings, and demonstrate that the proposed method can achieve same test accuracy as the full-gradient AMSGrad with substantial communication savings. With its simplicity and efficiency, COMP-AMS can serve as a useful distributed training framework for adaptive gradient methods.
    RvS: What is Essential for Offline RL via Supervised Learning?. (arXiv:2112.10751v2 [cs.LG] UPDATED)
    Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two-layer feedforward MLP is competitive with state-of-the-art results of substantially more complex methods based on TD learning or sequence modeling with Transformers. Carefully choosing model capacity (e.g., via regularization or architecture) and choosing which information to condition on (e.g., goals or rewards) are critical for performance. These insights serve as a field guide for practitioners doing Reinforcement Learning via Supervised Learning (which we coin "RvS learning"). They also probe the limits of existing RvS methods, which are comparatively weak on random data, and suggest a number of open problems.
    Self Reward Design with Fine-grained Interpretability. (arXiv:2112.15034v2 [cs.LG] UPDATED)
    Transparency and fairness issues stem from the black-box nature of deep neural networks (DNN). They are relevant to Deep Reinforcement Learning which also use DNN to learn its policy, value functions etc. This paper proposes a way to circumvent the issues through the bottom-up design of neural networks (NN) with detailed interpretability, where each neuron or layer has its own meaning and utility that corresponds to humanly understandable concept. With deliberate design, we show that lavaland problems can be solved using NN model with few parameters. Furthermore, we introduce the Self Reward Design (SRD), inspired by the Inverse Reward Design, so that our interpretable design can (1) solve the problem by pure design (although imperfectly) (2) be optimized via SRD (3) perform avoidance of unknown states by recognizing the inactivations of neurons aggregated as the activation in \(w_{unknown}\).
    RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization. (arXiv:2205.05671v1 [cs.CV])
    This paper explores training efficient VGG-style super-resolution (SR) networks with the structural re-parameterization technique. The general pipeline of re-parameterization is to train networks with multi-branch topology first, and then merge them into standard 3x3 convolutions for efficient inference. In this work, we revisit those primary designs and investigate essential components for re-parameterizing SR networks. First of all, we find that batch normalization (BN) is important to bring training non-linearity and improve the final performance. However, BN is typically ignored in SR, as it usually degrades the performance and introduces unpleasant artifacts. We carefully analyze the cause of BN issue and then propose a straightforward yet effective solution. In particular, we first train SR networks with mini-batch statistics as usual, and then switch to using population statistics at the later training period. While we have successfully re-introduced BN into SR, we further design a new re-parameterizable block tailored for SR, namely RepSR. It consists of a clean residual path and two expand-and-squeeze convolution paths with the modified BN. Extensive experiments demonstrate that our simple RepSR is capable of achieving superior performance to previous SR re-parameterization methods among different model sizes. In addition, our RepSR can achieve a better trade-off between performance and actual running time (throughput) than previous SR methods. Codes will be available at https://github.com/TencentARC/RepSR.
    Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts. (arXiv:2201.10295v2 [cs.LG] UPDATED)
    Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to "explain". Many researchers suggest using post-hoc explanation algorithms for this purpose. In this paper, we combine legal, philosophical and technical arguments to show that post-hoc explanation algorithms are unsuitable to achieve the law's objectives. Indeed, most situations where explanations are requested are adversarial, meaning that the explanation provider and receiver have opposing interests and incentives, so that the provider might manipulate the explanation for her own ends. We show that this fundamental conflict cannot be resolved because of the high degree of ambiguity of post-hoc explanations in realistic application scenarios. As a consequence, post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms. Instead, there is a need to more explicitly discuss the objectives underlying "explainability" obligations as these can often be better achieved through other mechanisms. There is an urgent need for a more open and honest discussion regarding the potential and limitations of post-hoc explanations in adversarial contexts, in particular in light of the current negotiations of the European Union's draft Artificial Intelligence Act.
    Hierarchical Collaborative Hyper-parameter Tuning. (arXiv:2205.05272v1 [cs.LG])
    Hyper-parameter Tuning is among the most critical stages in building machine learning solutions. This paper demonstrates how multi-agent systems can be utilized to develop a distributed technique for determining near-optimal values for any arbitrary set of hyper-parameters in a machine learning model. The proposed method employs a distributedly formed hierarchical agent-based architecture for the cooperative searching procedure of tuning hyper-parameter values. The presented generic model is used to develop a guided randomized agent-based tuning technique, and its behavior is investigated in both machine learning and global function optimization applications. According the empirical results, the proposed model outperformed both of its underlying randomized tuning strategies in terms of classification error and function evaluations, notably in higher number of dimensions.
    Efficient Automated Deep Learning for Time Series Forecasting. (arXiv:2205.05511v1 [cs.LG])
    Recent years have witnessed tremendously improved efficiency of Automated Machine Learning (AutoML), especially Automated Deep Learning (AutoDL) systems, but recent work focuses on tabular, image, or NLP tasks. So far, little attention has been paid to general AutoDL frameworks for time series forecasting, despite the enormous success in applying different novel architectures to such tasks. In this paper, we propose an efficient approach for the joint optimization of neural architecture and hyperparameters of the entire data processing pipeline for time series forecasting. In contrast to common NAS search spaces, we designed a novel neural architecture search space covering various state-of-the-art architectures, allowing for an efficient macro-search over different DL approaches. To efficiently search in such a large configuration space, we use Bayesian optimization with multi-fidelity optimization. We empirically study several different budget types enabling efficient multi-fidelity optimization on different forecasting datasets. Furthermore, we compared our resulting system, dubbed Auto-PyTorch-TS, against several established baselines and show that it significantly outperforms all of them across several datasets.
    DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision. (arXiv:2205.05575v1 [cs.LG])
    Following the success of supervised learning, semi-supervised learning (SSL) is now becoming increasingly popular. SSL is a family of methods, which in addition to a labeled training set, also use a sizable collection of unlabeled data for fitting a model. Most of the recent successful SSL methods are based on pseudo-labeling approaches: letting confident model predictions act as training labels. While these methods have shown impressive results on many benchmark datasets, a drawback of this approach is that not all unlabeled data are used during training. We propose a new SSL algorithm, DoubleMatch, which combines the pseudo-labeling technique with a self-supervised loss, enabling the model to utilize all unlabeled data in the training process. We show that this method achieves state-of-the-art accuracies on multiple benchmark datasets while also reducing training times compared to existing SSL methods. Code is available at https://github.com/walline/doublematch.
    Deep fusion of gray level co-occurrence matrices for lung nodule classification. (arXiv:2205.05123v1 [eess.IV])
    Lung cancer is a severe menace to human health, due to which millions of people die because of late diagnoses of cancer; thus, it is vital to detect the disease as early as possible. The Computerized chest analysis Tomography of scan is assumed to be one of the efficient solutions for detecting and classifying lung nodules. The necessity of high accuracy of analyzing C.T. scan images of the lung is considered as one of the crucial challenges in detecting and classifying lung cancer. A new long-short-term-memory (LSTM) based deep fusion structure, is introduced, where, the texture features computed from lung nodules through new volumetric grey-level-co-occurrence-matrices (GLCM) computations are applied to classify the nodules into: benign, malignant and ambiguous. An improved Otsu segmentation method combined with the water strider optimization algorithm (WSA) is proposed to detect the lung nodules. Otsu-WSA thresholding can overcome the restrictions present in previous thresholding methods. Extended experiments are run to assess this fusion structure by considering 2D-GLCM computations based 2D-slices fusion, and an approximation of this 3D-GLCM with volumetric 2.5D-GLCM computations-based LSTM fusion structure. The proposed methods are trained and assessed through the LIDC-IDRI dataset, where 94.4%, 91.6%, and 95.8% Accuracy, sensitivity, and specificity are obtained, respectively for 2D-GLCM fusion and 97.33%, 96%, and 98%, accuracy, sensitivity, and specificity, respectively, for 2.5D-GLCM fusion. The yield of the same are 98.7%, 98%, and 99%, for the 3D-GLCM fusion. The obtained results and analysis indicate that the WSA-Otsu method requires less execution time and yields a more accurate thresholding process. It is found that 3D-GLCM based LSTM outperforms its counterparts.
    Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection. (arXiv:2205.05206v1 [eess.AS])
    Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires solving a separate problem, namely active speaker detection (ASD), which entails selecting at each moment in time which of the visible faces corresponds to the audio. Recent work has shown that we can solve both problems simultaneously by employing an attention mechanism over the competing video tracks of the speakers' faces, at the cost of sacrificing some accuracy on active speaker detection. This work closes this gap in active speaker detection accuracy by presenting a single model that can be jointly trained with a multi-task loss. By combining the two tasks during training we reduce the ASD classification accuracy by approximately 25%, while simultaneously improving the ASR performance when compared to the multi-person baseline trained exclusively for ASR.
    Quantum Self-Attention Neural Networks for Text Classification. (arXiv:2205.05625v1 [quant-ph])
    An emerging direction of quantum computing is to establish meaningful quantum applications in various fields of artificial intelligence, including natural language processing (NLP). Although some efforts based on syntactic analysis have opened the door to research in Quantum NLP (QNLP), limitations such as heavy syntactic preprocessing and syntax-dependent network architecture make them impracticable on larger and real-world data sets. In this paper, we propose a new simple network architecture, called the quantum self-attention neural network (QSANN), which can make up for these limitations. Specifically, we introduce the self-attention mechanism into quantum neural networks and then utilize a Gaussian projected quantum self-attention serving as a sensible quantum version of self-attention. As a result, QSANN is effective and scalable on larger data sets and has the desirable property of being implementable on near-term quantum devices. In particular, our QSANN outperforms the best existing QNLP model based on syntactic analysis as well as a simple classical self-attention neural network in numerical experiments of text classification tasks on public data sets. We further show that our method exhibits robustness to low-level quantum noises.
    Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis. (arXiv:2205.05662v1 [cs.LG])
    Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex. Diverse operations are connected by complicated connectivity patterns, e.g., various types of skip connections. Those topological compositions are empirically effective and observed to smooth the loss landscape and facilitate the gradient flow in general. However, it remains elusive to derive any principled understanding of their effects on the DNN capacity or trainability, and to understand why or in which aspect one specific connectivity pattern is better than another. In this work, we theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in fine granularity. By analyzing a wide network's Neural Network Gaussian Process (NNGP), we are able to depict how the spectrum of an NNGP kernel propagates through a particular connectivity pattern, and how that affects the bound of convergence rates. As one practical implication of our results, we show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate, and significantly accelerate the large-scale neural architecture search without any overhead. Codes will be released at https://github.com/chenwydj/architecture_convergence.
    Human Language Modeling. (arXiv:2205.05128v1 [cs.CL])
    Natural language is generated by people, yet traditional language modeling views words or documents as if generated independently. Here, we propose human language modeling (HuLM), a hierarchical extension to the language modeling problem whereby a human-level exists to connect sequences of documents (e.g. social media messages) and capture the notion that human language is moderated by changing human states. We introduce, HaRT, a large-scale transformer model for the HuLM task, pre-trained on approximately 100,000 social media users, and demonstrate its effectiveness in terms of both language modeling (perplexity) for social media and fine-tuning for 4 downstream tasks spanning document- and user-levels: stance detection, sentiment classification, age estimation, and personality assessment. Results on all tasks meet or surpass the current state-of-the-art.
    DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio. (arXiv:2205.05474v1 [eess.AS])
    Deep learning-based speech enhancement has seen huge improvements and recently also expanded to full band audio (48 kHz). However, many approaches have a rather high computational complexity and require big temporal buffers for real time usage e.g. due to temporal convolutions or attention. Both make those approaches not feasible on embedded devices. This work further extends DeepFilterNet, which exploits harmonic structure of speech allowing for efficient speech enhancement (SE). Several optimizations in the training procedure, data augmentation, and network structure result in state-of-the-art SE performance while reducing the real-time factor to 0.04 on a notebook Core-i5 CPU. This makes the algorithm applicable to run on embedded devices in real-time. The DeepFilterNet framework can be obtained under an open source license.
    The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin. (arXiv:2204.11326v2 [cs.LG] UPDATED)
    Local quadratic approximation has been extensively used to study the optimization of neural network loss functions around the minimum. Though, it usually holds in a very small neighborhood of the minimum, and cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon[4] observed for gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-uniformity of training data is one of its cause. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth or multiple separate scales.
    RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation. (arXiv:2205.05678v1 [cs.CV])
    This work considers identifying parameters characterizing a physical system's dynamic motion directly from a video whose rendering configurations are inaccessible. Existing solutions require massive training data or lack generalizability to unknown rendering configurations. We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem. Our core idea is to train a rendering-invariant state-prediction (RISP) network that transforms image differences into state differences independent of rendering configurations, e.g., lighting, shadows, or material reflectance. To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering. Moreover, we present an efficient, second-order method to compute the gradients of this loss, allowing it to be integrated seamlessly into modern deep learning frameworks. We evaluate our method in rigid-body and deformable-body simulation environments using four tasks: state estimation, system identification, imitation learning, and visuomotor control. We further demonstrate the efficacy of our approach on a real-world example: inferring the state and action sequences of a quadrotor from a video of its motion sequences. Compared with existing methods, our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
    Spatial-temporal associations representation and application for process monitoring using graph convolution neural network. (arXiv:2205.05250v1 [cs.LG])
    Industrial process data reflects the dynamic changes of operation conditions, which mainly refer to the irregular changes in the dynamic associations between different variables in different time. And this related associations knowledge for process monitoring is often implicit in these dynamic monitoring data which always have richer operation condition information and have not been paid enough attention in current research. To this end, a new process monitoring method based on spatial-based graph convolution neural network (SGCN) is proposed to describe the characteristics of the dynamic associations which can be used to represent the operation status over time. Spatia-temporal graphs are firstly defined, which can be used to represent the characteristics of node attributes (dynamic edge features) dynamically changing with time. Then, the associations between monitoring variables at a certain time can be considered as the node attributes to define a snapshot of the static graph network at the certain time. Finally, the snapshot containing graph structure and node attributes is used as model inputs which are processed to implement graph classification by spatial-based convolution graph neural network with aggregate and readout steps. The feasibility and applicability of this proposed method are demonstrated by our experimental results of benchmark and practical case application.
    Reducing a complex two-sided smartwatch examination for Parkinson's Disease to an efficient one-sided examination preserving machine learning accuracy. (arXiv:2205.05361v1 [cs.LG])
    Sensors from smart consumer devices have demonstrated high potential to serve as digital biomarkers in the identification of movement disorders in recent years. With the usage of broadly available smartwatches we have recorded participants performing technology-based assessments in a prospective study to research Parkinson's Disease (PD). In total, 504 participants, including PD patients, differential diagnoses (DD) and healthy controls (HC), were captured with a comprehensive system utilizing two smartwatches and two smartphones. To the best of our knowledge, this study provided the largest PD sample size of two-hand synchronous smartwatch measurements. To establish a future easy-to use home-based assessment system in PD screening, we systematically evaluated the performance of the system based on a significantly reduced set of assessments with only one-sided measures and assessed, whether we can maintain classification accuracy.
    Making Pre-trained Language Models Good Long-tailed Learners. (arXiv:2205.05461v1 [cs.CL])
    Prompt-tuning has shown appealing performance in few-shot classification by virtue of its capability in effectively exploiting pre-trained knowledge. This motivates us to check the hypothesis that prompt-tuning is also a promising choice for long-tailed classification, since the tail classes are intuitively few-shot ones. To achieve this aim, we conduct empirical studies to examine the hypothesis. The results demonstrate that prompt-tuning exactly makes pre-trained language models at least good long-tailed learners. For intuitions on why prompt-tuning can achieve good performance in long-tailed classification, we carry out an in-depth analysis by progressively bridging the gap between prompt-tuning and commonly used fine-tuning. The summary is that the classifier structure and parameterization form the key to making good long-tailed learners, in comparison with the less important input structure. Finally, we verify the applicability of our finding to few-shot classification.
    Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. (arXiv:2205.05638v1 [cs.LG])
    Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new parameter-efficient fine-tuning method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available.
    Choice of training label matters: how to best use deep learning for quantitative MRI parameter estimation. (arXiv:2205.05587v1 [physics.med-ph])
    Deep learning (DL) is gaining popularity as a parameter estimation method for quantitative MRI. A range of competing implementations have been proposed, relying on either supervised or self-supervised learning. Self-supervised approaches, sometimes referred to as unsupervised, have been loosely based on auto-encoders, whereas supervised methods have, to date, been trained on groundtruth labels. These two learning paradigms have been shown to have distinct strengths. Notably, self-supervised approaches have offered lower-bias parameter estimates than their supervised alternatives. This result is counterintuitive - incorporating prior knowledge with supervised labels should, in theory, lead to improved accuracy. In this work, we show that this apparent limitation of supervised approaches stems from the naive choice of groundtruth training labels. By training on labels which are deliberately not groundtruth, we show that the low-bias parameter estimation previously associated with self-supervised methods can be replicated - and improved on - within a supervised learning framework. This approach sets the stage for a single, unifying, deep learning parameter estimation framework, based on supervised learning, where trade-offs between bias and variance are made by careful adjustment of training label.
    Benchmarking Graph Neural Networks. (arXiv:2003.00982v4 [cs.LG] UPDATED)
    In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of May 2022, the GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.
    CVTT: Cross-Validation Through Time. (arXiv:2205.05393v1 [cs.LG])
    The practical aspects of evaluating recommender systems is an actively discussed topic in the research community. While many current evaluation techniques bring performance down to a single-value metric as a straightforward approach for model comparison, it is based on a strong assumption of the methods' stable performance over time. In this paper, we argue that leaving out a method's continuous performance can lead to losing valuable insight into joint data-method effects. We propose the Cross-Validation Thought Time (CVTT) technique to perform more detailed evaluations, which focus on model cross-validation performance over time. Using the proposed technique, we conduct a detailed analysis of popular RecSys algorithms' performance against various metrics and datasets. We also compare several data preparation and evaluation strategies to analyze their impact on model performance. Our results show that model performance can vary significantly over time, and both data and evaluation setup can have a marked effect on it.
    End-to-End Multi-Person Audio/Visual Automatic Speech Recognition. (arXiv:2205.05586v1 [eess.AS])
    Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching the audio. However, in a more realistic setting, when multiple faces are potentially on screen one needs to decide which face to feed to the A/V ASR system. The present work takes the recent progress of A/V ASR one step further and considers the scenario where multiple people are simultaneously on screen (multi-person A/V ASR). We propose a fully differentiable A/V ASR model that is able to handle multiple face tracks in a video. Instead of relying on two separate models for speaker face selection and audio-visual ASR on a single face track, we introduce an attention layer to the ASR encoder that is able to soft-select the appropriate face video track. Experiments carried out on an A/V system trained on over 30k hours of YouTube videos illustrate that the proposed approach can automatically select the proper face tracks with minor WER degradation compared to an oracle selection of the speaking face while still showing benefits of employing the visual signal instead of the audio alone.
    Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery. (arXiv:2106.02190v6 [cs.LG] UPDATED)
    We developed Distilled Graph Attention Policy Network (DGAPN), a reinforcement learning model to generate novel graph-structured chemical representations that optimize user-defined objectives by efficiently navigating a physically constrained domain. The framework is examined on the task of generating molecules that are designed to bind, noncovalently, to functional sites of SARS-CoV-2 proteins. We present a spatial Graph Attention (sGAT) mechanism that leverages self-attention over both node and edge attributes as well as encoding the spatial structure -- this capability is of considerable interest in synthetic biology and drug discovery. An attentional policy network is introduced to learn the decision rules for a dynamic, fragment-based chemical environment, and state-of-the-art policy gradient techniques are employed to train the network with stability. Exploration is driven by the stochasticity of the action space design and the innovation reward bonuses learned and proposed by random network distillation. In experiments, our framework achieved outstanding results compared to state-of-the-art algorithms, while reducing the complexity of paths to chemical synthesis.
    Learning Spatiotemporal Chaos Using Next-Generation Reservoir Computing. (arXiv:2203.13294v2 [cs.LG] UPDATED)
    Forecasting the behavior of high-dimensional dynamical systems using machine learning requires efficient methods to learn the underlying physical model. We demonstrate spatiotemporal chaos prediction using a machine learning architecture that, when combined with a next-generation reservoir computer, displays state-of-the-art performance with a training time $10^3-10^4$ times faster and training data set $\sim 10^2$ times smaller than other machine learning algorithms. We also take advantage of the translational symmetry of the model to further reduce the computational cost and training data, each by a factor of $\sim$10.
    Performance of a deep learning system for detection of referable diabetic retinopathy in real clinical settings. (arXiv:2205.05554v1 [eess.IV])
    Background: To determine the ability of a commercially available deep learning system, RetCAD v.1.3.1 (Thirona, Nijmegen, The Netherlands) for the automatic detection of referable diabetic retinopathy (DR) on a dataset of colour fundus images acquired during routine clinical practice in a tertiary hospital screening program, analyzing the reduction of workload that can be released incorporating this artificial intelligence-based technology. Methods: Evaluation of the software was performed on a dataset of 7195 nonmydriatic fundus images from 6325 eyes of 3189 diabetic patients attending our screening program between February to December of 2019. The software generated a DR severity score for each colour fundus image which was combined into an eye-level score. This score was then compared with a reference standard as set by a human expert using receiver operating characteristic (ROC) curve analysis. Results: The artificial intelligence (AI) software achieved an area under the ROC curve (AUC) value of 0.988 [0.981:0.993] for the detection of referable DR. At the proposed operating point, the sensitivity of the RetCAD software for DR is 90.53% and specificity is 97.13%. A workload reduction of 96% could be achieved at the cost of only 6 false negatives. Conclusions: The AI software correctly identified the vast majority of referable DR cases, with a workload reduction of 96% of the cases that would need to be checked, while missing almost no true cases, so it may therefore be used as an instrument for triage.
    Deep Graph Clustering via Mutual Information Maximization and Mixture Model. (arXiv:2205.05168v1 [cs.LG])
    Attributed graph clustering or community detection which learns to cluster the nodes of a graph is a challenging task in graph analysis. In this paper, we introduce a contrastive learning framework for learning clustering-friendly node embedding. Although graph contrastive learning has shown outstanding performance in self-supervised graph learning, using it for graph clustering is not well explored. We propose Gaussian mixture information maximization (GMIM) which utilizes a mutual information maximization approach for node embedding. Meanwhile, it assumes that the representation space follows a Mixture of Gaussians (MoG) distribution. The clustering part of our objective tries to fit a Gaussian distribution to each community. The node embedding is jointly optimized with the parameters of MoG in a unified framework. Experiments on real-world datasets demonstrate the effectiveness of our method in community detection.
    Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise. (arXiv:2102.04297v4 [cs.LG] UPDATED)
    The empirical success of deep learning is often attributed to SGD's mysterious ability to avoid sharp local minima in the loss landscape, as sharp minima are known to lead to poor generalization. Recently, empirical evidence of heavy-tailed gradient noise was reported in many deep learning tasks, and it was shown in \c{S}im\c{s}ekli (2019a,b) that SGD can escape sharp local minima under the presence of such heavy-tailed gradient noise, providing a partial solution to the mystery. In this work, we analyze a popular variant of SGD where gradients are truncated above a fixed threshold. We show that it achieves a stronger notion of avoiding sharp minima: it can effectively eliminate sharp local minima entirely from its training trajectory. We characterize the dynamics of truncated SGD driven by heavy-tailed noises. First, we show that the truncation threshold and width of the attraction field dictate the order of the first exit time from the associated local minimum. Moreover, when the objective function satisfies appropriate structural conditions, we prove that as the learning rate decreases, the dynamics of heavy-tailed truncated SGD closely resemble those of a continuous-time Markov chain that never visits any sharp minima. Real data experiments on deep learning confirm our theoretical prediction that heavy-tailed SGD with gradient clipping finds a "flatter" local minima and achieves better generalization.
    A simple framework for contrastive learning phases of matter. (arXiv:2205.05607v1 [cond-mat.dis-nn])
    A main task in condensed-matter physics is to recognize, classify, and characterize phases of matter and the corresponding phase transitions, for which machine learning provides a new class of research tools due to the remarkable development in computing power and algorithms. Despite much exploration in this new field, usually different methods and techniques are needed for different scenarios. Here, we present SimCLP: a simple framework for contrastive learning phases of matter, which is inspired by the recent development in contrastive learning of visual representations. We demonstrate the success of this framework on several representative systems, including classical and quantum, single-particle and many-body, conventional and topological. SimCLP is flexible and free of usual burdens such as manual feature engineering and prior knowledge. The only prerequisite is to prepare enough state configurations. Furthermore, it can generate representation vectors and labels and hence help tackle other problems. SimCLP therefore paves an alternative way to the development of a generic tool for identifying unexplored phase transitions.
    A Model-Free Sampling Method for Estimating Basins of Attraction Using Hybrid Active Learning (HAL). (arXiv:2003.10976v3 [cs.LG] UPDATED)
    Understanding the basins of attraction (BoA) is often a paramount consideration for nonlinear systems. Most existing approaches to determining a high-resolution BoA require prior knowledge of the system's dynamical model (e.g., differential equation or point mapping for continuous systems, cell mapping for discrete systems, etc.), which allows derivation of approximate analytical solutions or parallel computing on a multi-core computer to find the BoA efficiently. However, these methods are typically impractical when the BoA must be determined experimentally or when the system's model is unknown. This paper introduces a model-free sampling method for BoA. The proposed method is based upon hybrid active learning (HAL) and is designed to find and label the "informative" samples, which efficiently determine the boundary of BoA. It consists of three primary parts: 1) additional sampling on trajectories (AST) to maximize the number of samples obtained from each simulation or experiment; 2) an active learning (AL) algorithm to exploit the local boundary of BoA; and 3) a density-based sampling (DBS) method to explore the global boundary of BoA. An example of estimating the BoA for a bistable nonlinear system is presented to show the high efficiency of our HAL sampling method.
    Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits. (arXiv:2112.06423v2 [cs.LG] UPDATED)
    We consider the upper confidence bound strategy for Gaussian multi-armed bandits with known control horizon sizes $N$ and build its limiting description with a system of stochastic differential equations and ordinary differential equations. Rewards for the arms are assumed to have unknown expected values and known variances. A set of Monte-Carlo simulations was performed for the case of close distributions of rewards, when mean rewards differ by the magnitude of order $N^{-1/2}$, as it yields the highest normalized regret, to verify the validity of the obtained description. The minimal size of the control horizon when the normalized regret is not noticeably larger than maximum possible was estimated.
    Sibylvariant Transformations for Robust Text Classification. (arXiv:2205.05137v1 [cs.CL])
    The vast majority of text transformation techniques in NLP are inherently limited in their ability to expand input space coverage due to an implicit constraint to preserve the original class label. In this work, we propose the notion of sibylvariance (SIB) to describe the broader set of transforms that relax the label-preserving constraint, knowably vary the expected class, and lead to significantly more diverse input distributions. We offer a unified framework to organize all data transformations, including two types of SIB: (1) Transmutations convert one discrete kind into another, (2) Mixture Mutations blend two or more classes together. To explore the role of sibylvariance within NLP, we implemented 41 text transformations, including several novel techniques like Concept2Sentence and SentMix. Sibylvariance also enables a unique form of adaptive training that generates new input mixtures for the most confused class pairs, challenging the learner to differentiate with greater nuance. Our experiments on six benchmark datasets strongly support the efficacy of sibylvariance for generalization performance, defect detection, and adversarial robustness.
    DNA data storage, sequencing data-carrying DNA. (arXiv:2205.05488v1 [cs.ET])
    DNA is a leading candidate as the next archival storage media due to its density, durability and sustainability. To read (and write) data DNA storage exploits technology that has been developed over decades to sequence naturally occurring DNA in the life sciences. To achieve higher accuracy for previously unseen, biological DNA, sequencing relies on extending and training deep machine learning models known as basecallers. This growth in model complexity requires substantial resources, both computational and data sets. It also eliminates the possibility of a compact read head for DNA as a storage medium. We argue that we need to depart from blindly using sequencing models from the life sciences for DNA data storage. The difference is striking: for life science applications we have no control over the DNA, however, in the case of DNA data storage, we control how it is written, as well as the particular write head. More specifically, data-carrying DNA can be modulated and embedded with alignment markers and error correcting codes to guarantee higher fidelity and to carry out some of the work that the machine learning models perform. In this paper, we study accuracy trade-offs between deep model size and error correcting codes. We show that, starting with a model size of 107MB, the reduced accuracy from model compression can be compensated by using simple error correcting codes in the DNA sequences. In our experiments, we show that a substantial reduction in the size of the model does not incur an undue penalty for the error correcting codes used, therefore paving the way for portable data-carrying DNA read head. Crucially, we show that through the joint use of model compression and error correcting codes, we achieve a higher read accuracy than without compression and error correction codes.
    A Unified f-divergence Framework Generalizing VAE and GAN. (arXiv:2205.05214v1 [stat.ML])
    Developing deep generative models that flexibly incorporate diverse measures of probability distance is an important area of research. Here we develop an unified mathematical framework of f-divergence generative model, f-GM, that incorporates both VAE and f-GAN, and enables tractable learning with general f-divergences. f-GM allows the experimenter to flexibly design the f-divergence function without changing the structure of the networks or the learning procedure. f-GM jointly models three components: a generator, a inference network and a density estimator. Therefore it simultaneously enables sampling, posterior inference of the latent variable as well as evaluation of the likelihood of an arbitrary datum. f-GM belongs to the class of encoder-decoder GANs: our density estimator can be interpreted as playing the role of a discriminator between samples in the joint space of latent code and observed space. We prove that f-GM naturally simplifies to the standard VAE and to f-GAN as special cases, and illustrates the connections between different encoder-decoder GAN architectures. f-GM is compatible with general network architecture and optimizer. We leverage it to experimentally explore the effects -- e.g. mode collapse and image sharpness -- of different choices of f-divergence.
    REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using Reinforcement Learning Agents. (arXiv:2110.05128v2 [cs.LG] UPDATED)
    Deep Reinforcement Learning (Deep RL) has been in the spotlight for the past few years, due to its remarkable abilities to solve problems which were considered to be practically unsolvable using traditional Machine Learning methods. However, even state-of-the-art Deep RL algorithms have various weaknesses that prevent them from being used extensively within industry applications, with one such major weakness being their sample-inefficiency. In an effort to patch these issues, we integrated a meta-learning technique in order to shift the objective of learning to solve a task into the objective of learning how to learn to solve a task (or a set of tasks), which we empirically show that improves overall stability and performance of Deep RL algorithms. Our model, named REIN-2, is a meta-learning scheme formulated within the RL framework, the goal of which is to develop a meta-RL agent (meta-learner) that learns how to produce other RL agents (inner-learners) that are capable of solving given environments. For this task, we convert the typical interaction of an RL agent with the environment into a new, single environment for the meta-learner to interact with. Compared to traditional state-of-the-art Deep RL algorithms, experimental results show remarkable performance of our model in popular OpenAI Gym environments in terms of scoring and sample efficiency, including the Mountain Car hard-exploration environment.
    Advanced sleep spindle identification with neural networks. (arXiv:2202.05158v2 [eess.SP] UPDATED)
    Sleep spindles are neurophysiological phenomena that appear to be linked to memory formation and other functions of the central nervous system, and that can be observed in electroencephalographic recordings (EEG) during sleep. Manually identified spindle annotations in EEG recordings suffer from substantial intra- and inter-rater variability, even if raters have been highly trained, which reduces the reliability of spindle measures as a research and diagnostic tool. The Massive Online Data Annotation (MODA) project has recently addressed this problem by forming a consensus from multiple such rating experts, thus providing a corpus of spindle annotations of enhanced quality. Based on this dataset, we present a U-Net-type deep neural network model to automatically detect sleep spindles. Our model's performance exceeds that of the state-of-the-art detector and of most experts in the MODA dataset. We observed improved detection accuracy in subjects of all ages, including older individuals whose spindles are particularly challenging to detect reliably. Our results underline the potential of automated methods to do repetitive cumbersome tasks with super-human performance.
    Extracting Latent Steering Vectors from Pretrained Language Models. (arXiv:2205.05124v1 [cs.CL])
    Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vectors directly from pretrained language model decoders without fine-tuning. Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly (> 99 BLEU) for English sentences from a variety of domains. We show that vector arithmetic can be used for unsupervised sentiment transfer on the Yelp sentiment benchmark, with performance comparable to models tailored to this task. We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark (STS-B), outperforming pooled hidden states of models. Finally, we present an analysis of the intrinsic properties of the steering vectors. Taken together, our results suggest that frozen LMs can be effectively controlled through their latent steering space.
    Evaluation Gaps in Machine Learning Practice. (arXiv:2205.05256v1 [cs.LG])
    Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application ecosystem is critical for its responsible use, and requires considering a broad range of factors including harms, benefits, and responsibilities. In practice, however, evaluations of ML models frequently focus on only a narrow range of decontextualized predictive behaviours. We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations. Through an empirical study of papers from recent high-profile conferences in the Computer Vision and Natural Language Processing communities, we demonstrate a general focus on a handful of evaluation methods. By considering the metrics and test data distributions used in these methods, we draw attention to which properties of models are centered in the field, revealing the properties that are frequently neglected or sidelined during evaluation. By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts; these include commitments to consequentialism, abstractability from context, the quantifiability of impacts, the limited role of model inputs in evaluation, and the equivalence of different failure modes. Shedding light on these assumptions enables us to question their appropriateness for ML system contexts, pointing the way towards more contextualized evaluation methodologies for robustly examining the trustworthiness of ML models
    Internet of Behavior (IoB) and Explainable AI Systems for Influencing IoT Behavior. (arXiv:2109.07239v2 [cs.DC] UPDATED)
    Pandemics and natural disasters over the years have changed the behavior of people, which has had a tremendous impact on all life aspects. With the technologies available in each era, governments, organizations, and companies have used these technologies to track, control, and influence the behavior of individuals for a benefit. Nowadays, the use of the Internet of Things (IoT), cloud computing, and artificial intelligence (AI) have made it easier to track and change the behavior of users through changing IoT behavior. This article introduces and discusses the concept of the Internet of Behavior (IoB) and its integration with Explainable AI (XAI) techniques to provide trusted and evident experience in the process of changing IoT behavior to ultimately improving users' behavior. Therefore, a system based on IoB and XAI has been proposed in a use case scenario of electrical power consumption that aims to influence user consuming behavior to reduce power consumption and cost. The scenario results showed a decrease of 522.2 kW of active power when compared to original consumption over a 200-hours period. It also showed a total power cost saving of 95.04 Euro for the same period. Moreover, decreasing the global active power will reduce the power intensity through the positive correlation.
    Is calibration a fairness requirement? An argument from the point of view of moral philosophy and decision theory. (arXiv:2205.05512v1 [cs.LG])
    In this paper, we provide a moral analysis of two criteria of statistical fairness debated in the machine learning literature: 1) calibration between groups and 2) equality of false positive and false negative rates between groups. In our paper, we focus on moral arguments in support of either measure. The conflict between group calibration vs. false positive and false negative rate equality is one of the core issues in the debate about group fairness definitions among practitioners. For any thorough moral analysis, the meaning of the term fairness has to be made explicit and defined properly. For our paper, we equate fairness with (non-)discrimination, which is a legitimate understanding in the discussion about group fairness. More specifically, we equate it with prima facie wrongful discrimination in the sense this is used in Prof. Lippert-Rasmussen's treatment of this definition. In this paper, we argue that a violation of group calibration may be unfair in some cases, but not unfair in others. This is in line with claims already advanced in the literature, that algorithmic fairness should be defined in a way that is sensitive to context. The most important practical implication is that arguments based on examples in which fairness requires between-group calibration, or equality in the false-positive/false-negative rates, do no generalize. For it may be that group calibration is a fairness requirement in one case, but not in another.
    RLOP: RL Methods in Option Pricing from a Mathematical Perspective. (arXiv:2205.05600v1 [q-fin.PR])
    Abstract In this work, we build two environments, namely the modified QLBS and RLOP models, from a mathematics perspective which enables RL methods in option pricing through replicating by portfolio. We implement the environment specifications (the source code can be found at https://github.com/owen8877/RLOP), the learning algorithm, and agent parametrization by a neural network. The learned optimal hedging strategy is compared against the BS prediction. The effect of various factors is considered and studied based on how they affect the optimal price and position.
    Stochastic Variational Smoothed Model Checking. (arXiv:2205.05398v1 [cs.LG])
    Model-checking for parametric stochastic models can be expressed as checking the satisfaction probability of a certain property as a function of the parameters of the model. Smoothed model checking (smMC) leverages Gaussian Processes (GP) to infer the satisfaction function over the entire parameter space from a limited set of observations obtained via simulation. This approach provides accurate reconstructions with statistically sound quantification of the uncertainty. However, it inherits the scalability issues of GP. In this paper, we exploit recent advances in probabilistic machine learning to push this limitation forward, making Bayesian inference of smMC scalable to larger datasets, enabling its application to larger models in terms of the dimension of the parameter set. We propose Stochastic Variational Smoothed Model Checking (SV-smMC), a solution that exploits stochastic variational inference (SVI) to approximate the posterior distribution of the smMC problem. The strength and flexibility of SVI make SV-smMC applicable to two alternative probabilistic models: Gaussian Processes (GP) and Bayesian Neural Networks (BNN). Moreover, SVI makes inference easily parallelizable and it enables GPU acceleration. In this paper, we compare the performances of smMC against those of SV-smMC by looking at the scalability, the computational efficiency and at the accuracy of the reconstructed satisfaction function.
    Federated Learning from Only Unlabeled Data with Class-Conditional-Sharing Clients. (arXiv:2204.03304v2 [cs.LG] UPDATED)
    Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. However, potential clients might even be reluctant to label their own data, which could limit the applicability of FL in practice. In this paper, we show the possibility of unsupervised FL whose model is still a classifier for predicting class labels, if the class-prior probabilities are shifted while the class-conditional distributions are shared among the unlabeled data owned by the clients. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model. FedUL is a very general solution to unsupervised FL: it is compatible with many supervised FL methods, and the recovery of the wanted model can be theoretically guaranteed as if the data have been labeled. Experiments on benchmark and real-world datasets demonstrate the effectiveness of FedUL. Code is available at https://github.com/lunanbit/FedUL.
    Analysis of three dimensional potential problems in non-homogeneous media with physics-informed deep collocation method using material transfer learning and sensitivity analysis. (arXiv:2010.12060v2 [cs.LG] UPDATED)
    In this work, we present a deep collocation method for three dimensional potential problems in nonhomogeneous media. This approach utilizes a physics informed neural network with material transfer learning reducing the solution of the nonhomogeneous partial differential equations to an optimization problem. We tested different cofigurations of the physics informed neural network including smooth activation functions, sampling methods for collocation points generation and combined optimizers. A material transfer learning technique is utilised for nonhomogeneous media with different material gradations and parameters, which enhance the generality and robustness of the proposed method. In order to identify the most influential parameters of the network configuration, we carried out a global sensitivity analysis. Finally, we provide a convergence proof of our DCM. The approach is validated through several benchmark problems, also testing different material variations.
    Blockchain-based Secure Client Selection in Federated Learning. (arXiv:2205.05611v1 [cs.CR])
    Despite the great potential of Federated Learning (FL) in large-scale distributed learning, the current system is still subject to several privacy issues due to the fact that local models trained by clients are exposed to the central server. Consequently, secure aggregation protocols for FL have been developed to conceal the local models from the server. However, we show that, by manipulating the client selection process, the server can circumvent the secure aggregation to learn the local models of a victim client, indicating that secure aggregation alone is inadequate for privacy protection. To tackle this issue, we leverage blockchain technology to propose a verifiable client selection protocol. Owing to the immutability and transparency of blockchain, our proposed protocol enforces a random selection of clients, making the server unable to control the selection process at its discretion. We present security proofs showing that our protocol is secure against this attack. Additionally, we conduct several experiments on an Ethereum-like blockchain to demonstrate the feasibility and practicality of our solution.
    Characterizing the Action-Generalization Gap in Deep Q-Learning. (arXiv:2205.05588v1 [cs.AI])
    We study the action generalization ability of deep Q-learning in discrete action spaces. Generalization is crucial for efficient reinforcement learning (RL) because it allows agents to use knowledge learned from past experiences on new tasks. But while function approximation provides deep RL agents with a natural way to generalize over state inputs, the same generalization mechanism does not apply to discrete action outputs. And yet, surprisingly, our experiments indicate that Deep Q-Networks (DQN), which use exactly this type of function approximator, are still able to achieve modest action generalization. Our main contribution is twofold: first, we propose a method of evaluating action generalization using expert knowledge of action similarity, and empirically confirm that action generalization leads to faster learning; second, we characterize the action-generalization gap (the difference in learning performance between DQN and the expert) in different domains. We find that DQN can indeed generalize over actions in several simple domains, but that its ability to do so decreases as the action space grows larger.
    Weak Supervision with Incremental Source Accuracy Estimation. (arXiv:2205.05302v1 [cs.LG])
    Motivated by the desire to generate labels for real-time data we develop a method to estimate the dependency structure and accuracy of weak supervision sources incrementally. Our method first estimates the dependency structure associated with the supervision sources and then uses this to iteratively update the estimated source accuracies as new data is received. Using both off-the-shelf classification models trained using publicly-available datasets and heuristic functions as supervision sources we show that our method generates probabilistic labels with an accuracy matching that of existing off-line methods.
    Aggregating Pairwise Semantic Differences for Few-Shot Claim Veracity Classification. (arXiv:2205.05646v1 [cs.CL])
    As part of an automated fact-checking pipeline, the claim veracity classification task consists in determining if a claim is supported by an associated piece of evidence. The complexity of gathering labelled claim-evidence pairs leads to a scarcity of datasets, particularly when dealing with new domains. In this paper, we introduce SEED, a novel vector-based method to few-shot claim veracity classification that aggregates pairwise semantic differences for claim-evidence pairs. We build on the hypothesis that we can simulate class representative vectors that capture average semantic differences for claim-evidence pairs in a class, which can then be used for classification of new instances. We compare the performance of our method with competitive baselines including fine-tuned BERT/RoBERTa models, as well as the state-of-the-art few-shot veracity classification method that leverages language model perplexity. Experiments conducted on the FEVER and SCIFACT datasets show consistent improvements over competitive baselines in few-shot settings. Our code is available.
    Generation of non-stationary stochastic fields using Generative Adversarial Networks with limited training data. (arXiv:2205.05469v1 [cs.LG])
    In the context of generating geological facies conditioned on observed data, samples corresponding to all possible conditions are not generally available in the training set and hence the generation of these realizations depends primary on the generalization capability of the trained generative model. The problem becomes more complex when applied on non-stationary fields. In this work, we investigate the problem of training Generative Adversarial Networks (GANs) models against a dataset of geological channelized patterns that has a few non-stationary spatial modes and examine the training and self-conditioning settings that improve the generalization capability at new spatial modes that were never seen in the given training set. The developed training method allowed for effective learning of the correlation between the spatial conditions (i.e. non-stationary maps) and the realizations implicitly without using additional loss terms or solving a costly optimization problem at the realization generation phase. Our models, trained on real and artificial datasets were able to generate geologically-plausible realizations beyond the training samples with a strong correlation with the target maps.
    Contrastive Supervised Distillation for Continual Representation Learning. (arXiv:2205.05476v1 [cs.CV])
    In this paper, we propose a novel training procedure for the continual representation learning problem in which a neural network model is sequentially learned to alleviate catastrophic forgetting in visual search tasks. Our method, called Contrastive Supervised Distillation (CSD), reduces feature forgetting while learning discriminative features. This is achieved by leveraging labels information in a distillation setting in which the student model is contrastively learned from the teacher model. Extensive experiments show that CSD performs favorably in mitigating catastrophic forgetting by outperforming current state-of-the-art methods. Our results also provide further evidence that feature forgetting evaluated in visual retrieval tasks is not as catastrophic as in classification tasks. Code at: https://github.com/NiccoBiondi/ContrastiveSupervisedDistillation.
    Predicting hot electrons free energies from ground-state data. (arXiv:2205.05591v1 [cond-mat.mtrl-sci])
    Machine-learning potentials are usually trained on the ground-state, Born-Oppenheimer energy surface, which depends exclusively on the atomic positions and not on the simulation temperature. This disregards the effect of thermally-excited electrons, that is important in metals, and essential to the description of warm dense matter. An accurate physical description of these effects requires that the nuclei move on a temperature-dependent electronic free energy. We propose a method to obtain machine-learning predictions of this free energy at an arbitrary electron temperature using exclusively training data from ground-state calculations, avoiding the need to train temperature-dependent potentials. We benchmark our method on metallic liquid hydrogen at the conditions of the core of gas giants and brown dwarfs.
    Delayed Reinforcement Learning by Imitation. (arXiv:2205.05569v1 [cs.LG])
    When the agent's observations or interactions are delayed, classic reinforcement learning tools usually fail. In this paper, we propose a simple yet new and efficient solution to this problem. We assume that, in the undelayed environment, an efficient policy is known or can be easily learned, but the task may suffer from delays in practice and we thus want to take them into account. We present a novel algorithm, Delayed Imitation with Dataset Aggregation (DIDA), which builds upon imitation learning methods to learn how to act in a delayed environment from undelayed demonstrations. We provide a theoretical analysis of the approach that will guide the practical design of DIDA. These results are also of general interest in the delayed reinforcement learning literature by providing bounds on the performance between delayed and undelayed tasks, under smoothness conditions. We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading.
    Efficient Distributed Framework for Collaborative Multi-Agent Reinforcement Learning. (arXiv:2205.05248v1 [cs.AI])
    Multi-agent reinforcement learning for incomplete information environments has attracted extensive attention from researchers. However, due to the slow sample collection and poor sample exploration, there are still some problems in multi-agent reinforcement learning, such as unstable model iteration and low training efficiency. Moreover, most of the existing distributed framework are proposed for single-agent reinforcement learning and not suitable for multi-agent. In this paper, we design an distributed MARL framework based on the actor-work-learner architecture. In this framework, multiple asynchronous environment interaction modules can be deployed simultaneously, which greatly improves the sample collection speed and sample diversity. Meanwhile, to make full use of computing resources, we decouple the model iteration from environment interaction, and thus accelerate the policy iteration. Finally, we verified the effectiveness of propose framework in MaCA military simulation environment and the SMAC 3D realtime strategy gaming environment with imcomplete information characteristics.  ( 2 min )
    What is Proxy Discrimination?. (arXiv:2205.05265v1 [cs.LG])
    The near universal condemnation of proxy discrimination hides a disagreement over what it is. This work surveys various notions of proxy and proxy discrimination found in prior work and represents them in a common framework. These notions variously turn on statistical dependencies, causal effects, and intentions. It discusses the limitations and uses of each notation and of the concept as a whole.
    Access Trends of In-network Cache for Scientific Data. (arXiv:2205.05563v1 [cs.NI])
    Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic volume by a factor of 2.35 during a part of the study period. We further show that machine learning models could predict cache utilization with an accuracy of 0.88. This demonstrates that such cache usage is predictable, which could be useful for managing complex networking resources such as in-network caching.  ( 2 min )
    NDGGNET-A Node Independent Gate based Graph Neural Networks. (arXiv:2205.05348v1 [cs.LG])
    Graph Neural Networks (GNNs) is an architecture for structural data, and has been adopted in a mass of tasks and achieved fabulous results, such as link prediction, node classification, graph classification and so on. Generally, for a certain node in a given graph, a traditional GNN layer can be regarded as an aggregation from one-hop neighbors, thus a set of stacked layers are able to fetch and update node status within multi-hops. For nodes with sparse connectivity, it is difficult to obtain enough information through a single GNN layer as not only there are only few nodes directly connected to them but also can not propagate the high-order neighbor information. However, as the number of layer increases, the GNN model is prone to over-smooth for nodes with the dense connectivity, which resulting in the decrease of accuracy. To tackle this issue, in this thesis, we define a novel framework that allows the normal GNN model to accommodate more layers. Specifically, a node-degree based gate is employed to adjust weight of layers dynamically, that try to enhance the information aggregation ability and reduce the probability of over-smoothing. Experimental results show that our proposed model can effectively increase the model depth and perform well on several datasets.  ( 2 min )
    Automated differential equation solver based on the parametric approximation optimization. (arXiv:2205.05383v1 [math.NA])
    The numerical methods for differential equation solution allow obtaining a discrete field that converges towards the solution if the method is applied to the correct problem. Nevertheless, the numerical methods have the restricted class of the equations, on which the convergence with a given parameter set or range is proved. Only a few "cheap and dirty" numerical methods converge on a wide class of equations without parameter tuning with the lower approximation order price. The article presents a method that uses an optimization algorithm to obtain a solution using the parameterized approximation. The result may not be as precise as an expert one. However, it allows solving the wide class of equations in an automated manner without the algorithm's parameters change.  ( 2 min )
  • Open

    Imitation of Manipulation Skills Using Multiple Geometries. (arXiv:2203.01171v2 [cs.RO] UPDATED)
    Daily manipulation tasks are characterized by regular characteristics associated with the task structure, which can be described by multiple geometric primitives related to actions and object shapes. Such geometric descriptors can not be expressed only in Cartesian coordinate systems. In this paper, we propose a learning approach to extract the optimal representation from a dictionary of coordinate systems to represent an observed movement. This is achieved by using an extension of Gaussian distributions on Riemannian manifolds, which is used to analyse a set of user demonstrations statistically, by considering multiple geometries as candidate representations of the task. We formulate the reproduction problem as a general optimal control problem based on an iterative linear quadratic regulator (iLQR), where the Gaussian distribution in the extracted coordinate systems are used to define the cost function. We apply our approach to grasping and box opening tasks in simulation and on a 7-axis Franka Emika robot. The results show that the robot can exploit several geometries to execute the manipulation task and generalize it to new situations, by maintaining the invariant features of the skill in the coordinate system(s) of interest.
    Boundary Estimation from Point Clouds: Algorithms, Guarantees and Applications. (arXiv:2111.03217v2 [math.NA] UPDATED)
    We investigate identifying the boundary of a domain from sample points in the domain. We introduce new estimators for the normal vector to the boundary, distance of a point to the boundary, and a test for whether a point lies within a boundary strip. The estimators can be efficiently computed and are more accurate than the ones present in the literature. We provide rigorous error estimates for the estimators. Furthermore we use the detected boundary points to solve boundary-value problems for PDE on point clouds. We prove error estimates for the Laplace and eikonal equations on point clouds. Finally we provide a range of numerical experiments illustrating the performance of our boundary estimators, applications to PDE on point clouds, and tests on image data sets.
    A Model-Free Sampling Method for Estimating Basins of Attraction Using Hybrid Active Learning (HAL). (arXiv:2003.10976v3 [cs.LG] UPDATED)
    Understanding the basins of attraction (BoA) is often a paramount consideration for nonlinear systems. Most existing approaches to determining a high-resolution BoA require prior knowledge of the system's dynamical model (e.g., differential equation or point mapping for continuous systems, cell mapping for discrete systems, etc.), which allows derivation of approximate analytical solutions or parallel computing on a multi-core computer to find the BoA efficiently. However, these methods are typically impractical when the BoA must be determined experimentally or when the system's model is unknown. This paper introduces a model-free sampling method for BoA. The proposed method is based upon hybrid active learning (HAL) and is designed to find and label the "informative" samples, which efficiently determine the boundary of BoA. It consists of three primary parts: 1) additional sampling on trajectories (AST) to maximize the number of samples obtained from each simulation or experiment; 2) an active learning (AL) algorithm to exploit the local boundary of BoA; and 3) a density-based sampling (DBS) method to explore the global boundary of BoA. An example of estimating the BoA for a bistable nonlinear system is presented to show the high efficiency of our HAL sampling method.
    RvS: What is Essential for Offline RL via Supervised Learning?. (arXiv:2112.10751v2 [cs.LG] UPDATED)
    Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two-layer feedforward MLP is competitive with state-of-the-art results of substantially more complex methods based on TD learning or sequence modeling with Transformers. Carefully choosing model capacity (e.g., via regularization or architecture) and choosing which information to condition on (e.g., goals or rewards) are critical for performance. These insights serve as a field guide for practitioners doing Reinforcement Learning via Supervised Learning (which we coin "RvS learning"). They also probe the limits of existing RvS methods, which are comparatively weak on random data, and suggest a number of open problems.  ( 2 min )
    Towards Model Agnostic Federated Learning Using Knowledge Distillation. (arXiv:2110.15210v2 [cs.LG] UPDATED)
    Is it possible to design an universal API for federated learning using which an ad-hoc group of data-holders (agents) collaborate with each other and perform federated learning? Such an API would necessarily need to be model-agnostic i.e. make no assumption about the model architecture being used by the agents, and also cannot rely on having representative public data at hand. Knowledge distillation (KD) is the obvious tool of choice to design such protocols. However, surprisingly, we show that most natural KD-based federated learning protocols have poor performance. To investigate this, we propose a new theoretical framework, Federated Kernel ridge regression, which can capture both model heterogeneity as well as data heterogeneity. Our analysis shows that the degradation is largely due to a fundamental limitation of knowledge distillation under data heterogeneity. We further validate our framework by analyzing and designing new protocols based on KD. Their performance on real world experiments using neural networks, though still unsatisfactory, closely matches our theoretical predictions.  ( 2 min )
    Benchmarking Graph Neural Networks. (arXiv:2003.00982v4 [cs.LG] UPDATED)
    In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of May 2022, the GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.  ( 3 min )
    Stochastic Variational Smoothed Model Checking. (arXiv:2205.05398v1 [cs.LG])
    Model-checking for parametric stochastic models can be expressed as checking the satisfaction probability of a certain property as a function of the parameters of the model. Smoothed model checking (smMC) leverages Gaussian Processes (GP) to infer the satisfaction function over the entire parameter space from a limited set of observations obtained via simulation. This approach provides accurate reconstructions with statistically sound quantification of the uncertainty. However, it inherits the scalability issues of GP. In this paper, we exploit recent advances in probabilistic machine learning to push this limitation forward, making Bayesian inference of smMC scalable to larger datasets, enabling its application to larger models in terms of the dimension of the parameter set. We propose Stochastic Variational Smoothed Model Checking (SV-smMC), a solution that exploits stochastic variational inference (SVI) to approximate the posterior distribution of the smMC problem. The strength and flexibility of SVI make SV-smMC applicable to two alternative probabilistic models: Gaussian Processes (GP) and Bayesian Neural Networks (BNN). Moreover, SVI makes inference easily parallelizable and it enables GPU acceleration. In this paper, we compare the performances of smMC against those of SV-smMC by looking at the scalability, the computational efficiency and at the accuracy of the reconstructed satisfaction function.  ( 2 min )
    Externally Valid Treatment Choice. (arXiv:2205.05561v1 [econ.EM])
    We consider the problem of learning treatment (or policy) rules that are externally valid in the sense that they have welfare guarantees in target populations that are similar to, but possibly different from, the experimental population. We allow for shifts in both the distribution of potential outcomes and covariates between the experimental and target populations. This paper makes two main contributions. First, we provide a formal sense in which policies that maximize social welfare in the experimental population remain optimal for the "worst-case" social welfare when the distribution of potential outcomes (but not covariates) shifts. Hence, policy learning methods that have good regret guarantees in the experimental population, such as empirical welfare maximization, are externally valid with respect to a class of shifts in potential outcomes. Second, we develop methods for policy learning that are robust to shifts in the joint distribution of potential outcomes and covariates. Our methods may be used with experimental or observational data.  ( 2 min )
    AutoKE: An automatic knowledge embedding framework for scientific machine learning. (arXiv:2205.05390v1 [cs.LG])
    Imposing physical constraints on neural networks as a method of knowledge embedding has achieved great progress in solving physical problems described by governing equations. However, for many engineering problems, governing equations often have complex forms, including complex partial derivatives or stochastic physical fields, which results in significant inconveniences from the perspective of implementation. In this paper, a scientific machine learning framework, called AutoKE, is proposed, and a reservoir flow problem is taken as an instance to demonstrate that this framework can effectively automate the process of embedding physical knowledge. In AutoKE, an emulator comprised of deep neural networks (DNNs) is built for predicting the physical variables of interest. An arbitrarily complex equation can be parsed and automatically converted into a computational graph through the equation parser module, and the fitness of the emulator to the governing equation is evaluated via automatic differentiation. Furthermore, the fixed weights in the loss function are substituted with adaptive weights by incorporating the Lagrangian dual method. Neural architecture search (NAS) is also introduced into the AutoKE to select an optimal network architecture of the emulator according to the specific problem. Finally, we apply transfer learning to enhance the scalability of the emulator. In experiments, the framework is verified by a series of physical problems in which it can automatically embed physical knowledge into an emulator without heavy hand-coding. The results demonstrate that the emulator can not only make accurate predictions, but also be applied to similar problems with high efficiency via transfer learning.  ( 2 min )
    Probability Distribution of Hypervolume Improvement in Bi-objective Bayesian Optimization. (arXiv:2205.05505v1 [cs.LG])
    This work provides the exact expression of the probability distribution of the hypervolume improvement (HVI) for bi-objective generalization of Bayesian optimization. Here, instead of a single-objective improvement, we consider the improvement of the hypervolume indicator concerning the current best approximation of the Pareto front. Gaussian process regression models are trained independently on both objective functions, resulting in a bi-variate separated Gaussian distribution serving as a predictive model for the vector-valued objective function. Some commonly HVI-based acquisition functions (probability of improvement and upper confidence bound) are also leveraged with the help of the exact distribution of HVI. In addition, we show the superior numerical accuracy and efficiency of the exact distribution compared to the commonly used approximation by Monte-Carlo sampling. Finally, we benchmark distribution-leveraged acquisition functions on the widely applied ZDT problem set, demonstrating a significant advantage of using the exact distribution of HVI in multi-objective Bayesian optimization.  ( 2 min )
    Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections. (arXiv:2205.05359v1 [stat.ML])
    The increased predictive power of nonlinear models comes at the cost of interpretability of its terms. This trade-off has led to the emergence of eXplainable AI (XAI). XAI attempts to shed light on how models use predictors to arrive at a prediction with local explanations, a point estimate of the linear feature importance in the vicinity of one instance. These can be considered linear projections and can be further explored to understand better the interactions between features used to make predictions across the predictive model surface. Here we describe interactive linear interpolation used for exploration at any instance and illustrate with examples with categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) output. The methods are implemented in the R package cheem, available on CRAN.  ( 2 min )
    Variational Autoencoder Leveraged MMSE Channel Estimation. (arXiv:2205.05345v1 [eess.SP])
    We propose to utilize a variational autoencoder (VAE) for data-driven channel estimation. The underlying true and unknown channel distribution is modeled by the VAE as a conditional Gaussian distribution in a novel way, parameterized by the respective first and second order conditional moments. As a result, it can be observed that the linear minimum mean square error (LMMSE) estimator in its variant conditioned on the latent sample of the VAE approximates an optimal MSE estimator. Furthermore, we argue how a VAE-based channel estimator can approximate the MMSE channel estimator. We propose three variants of VAE estimators that differ in the data used during training and estimation. First, we show that given perfectly known channel state information at the input of the VAE during estimation, which is impractical, we obtain an estimator that can serve as a benchmark result for an estimation scenario. We then propose practically feasible approaches, where perfectly known channel state information is only necessary in the training phase or is not needed at all. Simulation results on 3GPP and QuaDRiGa channel data attest a small performance loss of the practical approaches and the superiority of our VAE approaches in comparison to other related channel estimation methods.  ( 2 min )
    Learning Multitask Gaussian Bayesian Networks. (arXiv:2205.05343v1 [stat.ML])
    Major depressive disorder (MDD) requires study of brain functional connectivity alterations for patients, which can be uncovered by resting-state functional magnetic resonance imaging (rs-fMRI) data. We consider the problem of identifying alterations of brain functional connectivity for a single MDD patient. This is particularly difficult since the amount of data collected during an fMRI scan is too limited to provide sufficient information for individual analysis. Additionally, rs-fMRI data usually has the characteristics of incompleteness, sparsity, variability, high dimensionality and high noise. To address these problems, we proposed a multitask Gaussian Bayesian network (MTGBN) framework capable for identifying individual disease-induced alterations for MDD patients. We assume that such disease-induced alterations show some degrees of similarity with the tool to learn such network structures from observations to understanding of how system are structured jointly from related tasks. First, we treat each patient in a class of observation as a task and then learn the Gaussian Bayesian networks (GBNs) of this data class by learning from all tasks that share a default covariance matrix that encodes prior knowledge. This setting can help us to learn more information from limited data. Next, we derive a closed-form formula of the complete likelihood function and use the Monte-Carlo Expectation-Maximization(MCEM) algorithm to search for the approximately best Bayesian network structures efficiently. Finally, we assess the performance of our methods with simulated and real-world rs-fMRI data.  ( 2 min )
    Analysis of convolutional neural network image classifiers in a rotationally symmetric model. (arXiv:2205.05500v1 [stat.ML])
    Convolutional neural network image classifiers are defined and the rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk is analyzed. Here we consider images as random variables with values in some functional space, where we only observe discrete samples as function values on some finite grid. Under suitable structural and smoothness assumptions on the functional a posteriori probability, which includes some kind of symmetry against rotation of subparts of the input image, it is shown that least squares plug-in classifiers based on convolutional neural networks are able to circumvent the curse of dimensionality in binary image classification if we neglect a resolution-dependent error term. The finite sample size behavior of the classifier is analyzed by applying it to simulated and real data.  ( 2 min )
    On Distributed Adaptive Optimization with Gradient Compression. (arXiv:2205.05632v1 [stat.ML])
    We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process. Our convergence analysis of COMP-AMS shows that such compressed gradient averaging strategy yields same convergence rate as standard AMSGrad, and also exhibits the linear speedup effect w.r.t. the number of local workers. Compared with recently proposed protocols on distributed adaptive methods, COMP-AMS is simple and convenient. Numerical experiments are conducted to justify the theoretical findings, and demonstrate that the proposed method can achieve same test accuracy as the full-gradient AMSGrad with substantial communication savings. With its simplicity and efficiency, COMP-AMS can serve as a useful distributed training framework for adaptive gradient methods.  ( 2 min )
    Stable and Interpretable Unrolled Dictionary Learning. (arXiv:2106.00058v4 [cs.LG] UPDATED)
    The dictionary learning problem, representing data as a combination of a few atoms, has long stood as a popular method for learning representations in statistics and signal processing. The most popular dictionary learning algorithm alternates between sparse coding and dictionary update steps, and a rich literature has studied its theoretical convergence. The success of dictionary learning relies on access to a ``good'' initial estimate of the dictionary and the ability of the sparse coding step to provide an unbiased estimate of the code. The growing popularity of unrolled sparse coding networks has led to the empirical finding that backpropagation through such networks performs dictionary learning. We offer the first theoretical analysis of these empirical results through PUDLE, a Provable Unrolled Dictionary LEarning method. We provide conditions on the network initialization and data distribution sufficient to recover and preserve the support of the latent sparse representation. Additionally, we address two challenges; first, the vanilla unrolled sparse coding computes a biased code estimate, and second, gradients during backpropagated learning can become unstable. We show approaches to reduce the bias of the code estimate in the forward pass, and that of the dictionary estimate in the backward pass. We propose strategies to resolve the learning instability. This is achieved by tuning network parameters and modifying the loss function. Overall, we highlight the impact of loss, unrolling, and backpropagation on convergence. We complement our findings through synthetic and image denoising experiments. Finally, we demonstrate PUDLE's interpretability, a driving factor in designing deep networks based on iterative optimizations, by building a mathematical relation between network weights, its output, and the training set.  ( 2 min )
    A Framework for Machine Learning of Model Error in Dynamical Systems. (arXiv:2107.06658v2 [math.DS] UPDATED)
    The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models.  ( 2 min )
    AutoTransfer: Subject Transfer Learning with Censored Representations on Biosignals Data. (arXiv:2112.09796v2 [cs.LG] UPDATED)
    We provide a regularization framework for subject transfer learning in which we seek to train an encoder and classifier to minimize classification loss, subject to a penalty measuring independence between the latent representation and the subject label. We introduce three notions of independence and corresponding penalty terms using mutual information or divergence as a proxy for independence. For each penalty term, we provide several concrete estimation algorithms, using analytic methods as well as neural critic functions. We provide a hands-off strategy for applying this diverse family of regularization algorithms to a new dataset, which we call "AutoTransfer". We evaluate the performance of these individual regularization strategies and our AutoTransfer method on EEG, EMG, and ECoG datasets, showing that these approaches can improve subject transfer learning for challenging real-world datasets.  ( 2 min )
    Contextual Search in the Presence of Adversarial Corruptions. (arXiv:2002.11650v5 [cs.LG] UPDATED)
    We study contextual search, a generalization of binary search in higher dimensions, which captures settings such as feature-based dynamic pricing. Standard formulations of this problem assume that agents act in accordance with a specific homogeneous response model. In practice, however, some responses may be adversarially corrupted. Existing algorithms heavily depend on the assumed response model being (approximately) accurate for all agents and have poor performance in the presence of even a few such arbitrary misspecifications. We initiate the study of contextual search when some of the agents can behave in ways inconsistent with the underlying response model. In particular, we provide two algorithms, one based on multidimensional binary search methods and one based on gradient descent. We show that these algorithms attain near-optimal regret in the absence of adversarial corruptions and their performance degrades gracefully with the number of such agents, providing the first results for contextual search in any adversarial noise model. Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis.  ( 2 min )
    Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise. (arXiv:2102.04297v4 [cs.LG] UPDATED)
    The empirical success of deep learning is often attributed to SGD's mysterious ability to avoid sharp local minima in the loss landscape, as sharp minima are known to lead to poor generalization. Recently, empirical evidence of heavy-tailed gradient noise was reported in many deep learning tasks, and it was shown in \c{S}im\c{s}ekli (2019a,b) that SGD can escape sharp local minima under the presence of such heavy-tailed gradient noise, providing a partial solution to the mystery. In this work, we analyze a popular variant of SGD where gradients are truncated above a fixed threshold. We show that it achieves a stronger notion of avoiding sharp minima: it can effectively eliminate sharp local minima entirely from its training trajectory. We characterize the dynamics of truncated SGD driven by heavy-tailed noises. First, we show that the truncation threshold and width of the attraction field dictate the order of the first exit time from the associated local minimum. Moreover, when the objective function satisfies appropriate structural conditions, we prove that as the learning rate decreases, the dynamics of heavy-tailed truncated SGD closely resemble those of a continuous-time Markov chain that never visits any sharp minima. Real data experiments on deep learning confirm our theoretical prediction that heavy-tailed SGD with gradient clipping finds a "flatter" local minima and achieves better generalization.  ( 2 min )
    DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision. (arXiv:2205.05575v1 [cs.LG])
    Following the success of supervised learning, semi-supervised learning (SSL) is now becoming increasingly popular. SSL is a family of methods, which in addition to a labeled training set, also use a sizable collection of unlabeled data for fitting a model. Most of the recent successful SSL methods are based on pseudo-labeling approaches: letting confident model predictions act as training labels. While these methods have shown impressive results on many benchmark datasets, a drawback of this approach is that not all unlabeled data are used during training. We propose a new SSL algorithm, DoubleMatch, which combines the pseudo-labeling technique with a self-supervised loss, enabling the model to utilize all unlabeled data in the training process. We show that this method achieves state-of-the-art accuracies on multiple benchmark datasets while also reducing training times compared to existing SSL methods. Code is available at https://github.com/walline/doublematch.  ( 2 min )
    A Unified f-divergence Framework Generalizing VAE and GAN. (arXiv:2205.05214v1 [stat.ML])
    Developing deep generative models that flexibly incorporate diverse measures of probability distance is an important area of research. Here we develop an unified mathematical framework of f-divergence generative model, f-GM, that incorporates both VAE and f-GAN, and enables tractable learning with general f-divergences. f-GM allows the experimenter to flexibly design the f-divergence function without changing the structure of the networks or the learning procedure. f-GM jointly models three components: a generator, a inference network and a density estimator. Therefore it simultaneously enables sampling, posterior inference of the latent variable as well as evaluation of the likelihood of an arbitrary datum. f-GM belongs to the class of encoder-decoder GANs: our density estimator can be interpreted as playing the role of a discriminator between samples in the joint space of latent code and observed space. We prove that f-GM naturally simplifies to the standard VAE and to f-GAN as special cases, and illustrates the connections between different encoder-decoder GAN architectures. f-GM is compatible with general network architecture and optimizer. We leverage it to experimentally explore the effects -- e.g. mode collapse and image sharpness -- of different choices of f-divergence.  ( 2 min )
    Regression-based projection for learning Mori--Zwanzig operators. (arXiv:2205.05135v1 [math.DS])
    We propose to adopt statistical regression as the projection operator to enable data-driven learning of the operators in the Mori--Zwanzig formalism. We present a principled algorithm to extract the Markov and memory operators for any regression models. We show that the choice of linear regression results in a recently proposed data-driven learning algorithm based on Mori's projection operator, which can be considered as a higher-order approximate Koopman learning method. We show that more expressive, potentially nonlinear regression models naturally fill in the gap between the highly idealized and computationally efficient Mori's projection operator and the most optimal yet computationally infeasible Zwanzig projection operator. We performed numerical experiments and extracted the operators for an array of regression-based projections, including linear, polynomial, spline, and neural-network-based regression, showing a progressive improvement as the complexity of the regression model increased. Our proposition provides a general framework to extract memory-dependent corrections and can be readily applied to an array of data-driven learning methods for stationary dynamical systems in the literature.  ( 2 min )
    Bias and Priors in Machine Learning Calibrations for High Energy Physics. (arXiv:2205.05084v1 [hep-ph])
    Machine learning offers an exciting opportunity to improve the calibration of nearly all reconstructed objects in high-energy physics detectors. However, machine learning approaches often depend on the spectra of examples used during training, an issue known as prior dependence. This is an undesirable property of a calibration, which needs to be applicable in a variety of environments. The purpose of this paper is to explicitly highlight the prior dependence of some machine learning-based calibration strategies. We demonstrate how some recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training, which can result in biases for downstream analyses. In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence, whereas prior-independent data-based calibration remains an open problem.  ( 2 min )
    Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation. (arXiv:2205.05330v1 [cs.SD])
    This paper describes heavy-tailed extensions of a state-of-the-art versatile blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) from a unified point of view. The common way of deriving such an extension is to replace the multivariate complex Gaussian distribution in the likelihood function with its heavy-tailed generalization, e.g., the multivariate complex Student's t and leptokurtic generalized Gaussian distributions, and tailor-make the corresponding parameter optimization algorithm. Using a wider class of heavy-tailed distributions called a Gaussian scale mixture (GSM), i.e., a mixture of Gaussian distributions whose variances are perturbed by positive random scalars called impulse variables, we propose GSM-FastMNMF and develop an expectationmaximization algorithm that works even when the probability density function of the impulse variables have no analytical expressions. We show that existing heavy-tailed FastMNMF extensions are instances of GSM-FastMNMF and derive a new instance based on the generalized hyperbolic distribution that include the normal-inverse Gaussian, Student's t, and Gaussian distributions as the special cases. Our experiments show that the normalinverse Gaussian FastMNMF outperforms the state-of-the-art FastMNMF extensions and ILRMA model in speech enhancement and separation in terms of the signal-to-distortion ratio.  ( 2 min )

  • Open

    MindsEye Lite run multiple text-to-image models in one place
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Terraform Provider Iterative (TPI) Helps ML/AI Teams Save Resources And Money While Switching Cloud Providers
    submitted by /u/thumbsdrivesmecrazy [link] [comments]
    What is the best learning class for an IA to play a Cards Game?
    I'm doing my project and I'm not sure what apprenticeship is the best for this kind of games what do you think? submitted by /u/Merelex_Buk-12 [link] [comments]
    Aimlflow - Aim-based experiment comparison over your MLflow logs
    🙌 Hey all, check out our work on AiMLflow! We are building a tool that mounts the MLflow logs and enables an Aim-based super-performant UI for metric, image and other ML metadata comparison. If you are using MLflow, we would love to chat with you and share our progress on the project for feedback. https://aimstack.io/aimlflow submitted by /u/ManeSa [link] [comments]  ( 1 min )
    This UBIAI article and tutorial explains the prediction of named entity recognition ( NER) model using local interpretable model-agnostic explanations ( LIME) Check more articles and tutorials at : ubiai.tools
    submitted by /u/UBIAI [link] [comments]  ( 1 min )
    Use Of AI To Reduce Street Crimes And Improving Civil Security
    In this digital era, several law enforcement agencies across the globe are leveraging artificial intelligence (AI) to resolve more criminal cases in a very short time equipped with AI algorithms developed to identify, locate and arrest the real or potential criminals faster than ever. Read more submitted by /u/ridamughal110 [link] [comments]  ( 1 min )
    Is AI eco-friendly?
    Artificial Intelligence (AI) systems can be used to reduce carbon emissions and eventually environmental degradation. It has the potential to understand the surroundings autonomously, learn from experience, make and implement decisions, and communicate with the surroundings. Read more submitted by /u/ridamughal110 [link] [comments]
    Interview with a computer scientist who works on AI & social/crowd computing. We speak about her career & life. If you are interested please subscribe as I'll be posting more AI interviews soon! :)
    submitted by /u/joemurray1994 [link] [comments]  ( 1 min )
    What is the difference between face detection and face recognition from an iOS perspective?
    submitted by /u/adilonreddit1 [link] [comments]  ( 1 min )
    Awesome Artificial Intelligence content (with code!) on Computer Vision News of May 2022
    Dear all, Awesome AI R&D content (with code!) on Computer Vision News of May 2022. Many great articles about AI, Deep Learning, Computer Vision and more... Review of award-winning ISBI 2022 paper. HTML5 version (recommended) PDF version Dilbert on page 2. Free subscription on page 60. Enjoy! https://preview.redd.it/gcc0qyxozuy81.jpg?width=400&format=pjpg&auto=webp&s=3a5d7913b9cc70bf25762d05f724115fd04d0123 submitted by /u/Gletta [link] [comments]  ( 1 min )
    AR/VR: The Next Frontier in Banking and Financial Services
    https://blog.r2c.io/ar-vr-the-next-frontier-in-banking-and-financial-services/ submitted by /u/R2Consulting [link] [comments]
    Some results look interesting!
    submitted by /u/limapedro [link] [comments]
    100+ Cheat Sheets for Machine Learning, Deep Learning and Data Science
    submitted by /u/vadhavaniyafaijan [link] [comments]
    Speech Recognition with TensorFlow.js - Voice Commands
    submitted by /u/RubiksCodeNMZ [link] [comments]
    The results of the AI experiment/survey I conducted on this sub a short time ago are here (link to the full study in the comment)
    submitted by /u/KazRainer [link] [comments]  ( 1 min )
    [P] PaddleOCR: An awesome and easy-to-use OCR system, which provides more than 80 kinds of multi-language recognition models.
    Hi, all, I am glad to share an open source repository PaddleOCR, which provides more than 80 kinds of multi-language recognition models, including English, Chinese, French, German, Arabic, Korean, Japanese and so on. Code:https://github.com/PaddlePaddle/PaddleOCR ​ Features Set: Ultra-lightweight OCR system: detection (3.6M) + direction classifier (1.4M) + recognition (12M) = 17.0M Support more than 80 kinds of multi-language recognition models, including English, Chinese, French, German, Arabic, Korean, Japanese and so on Semi-automatic data annotation tool PPOCRLabel: support rectangular box, irregular text, table and key information annotation modes Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image Support PIP installation, easy to use Support Linux, Windows, MacOS and other systems Apache-2.0 license ​ https://preview.redd.it/dkjcp8ln2ty81.png?width=795&format=png&auto=webp&s=e3751349712c7006c94e744f240f9207ad8d1b2d https://preview.redd.it/i95vto1p2ty81.png?width=795&format=png&auto=webp&s=1a1c3b58ccfb5f8fc403ba387a3a9fda6587e33d https://preview.redd.it/swlmsylm2ty81.png?width=795&format=png&auto=webp&s=a5567cbaa76e395330f75e9eba09100655974725 submitted by /u/Aha_IamDaniel [link] [comments]  ( 1 min )
    MELODIES POSITIVE: Bottle Art.
    submitted by /u/cookingandcraft [link] [comments]
    Self-hosted 'replika-like' chatbot using VA Framework and Oryzer Studio
    I purchased VA Framework Standard Edition at a great price during the Steam Holiday sale at Christmas and played around with it a bit when I first got it, but it has been sitting idle since then and I decided to start messing with it again. What attracted me to this was being combined with Oryzer Studio and its flow-based programming and already being attached to a 3D avatar interface with speech engine. What I am trying to do is create a personal self-hosted 'replika-like' chatbot that can be trained and learn from my input. Based on my research and communication with the developer prior to purchasing this, what I am hoping to accomplish should easily be obtainable using this. The problem is that the documentation and sample projects available are very sparse and what is available seems to be geared more towards corporate customer systems rather than personal systems. I have limited programming experience, mostly Visual Basic from when I was in college, which is why Oryzer studio seemed perfect for me. What I am hoping is that there is someone else who has experience with it and may be able to assist with getting me started or at least point me in the direction of some documentation and/or sample projects, other than the ones provided on the developer's website, that I can build off of. VA Framework, which I purchased, is the 3D Avatar and speech engine and Oryzer Studio, which is free, is used to create the 'workspace' files used by VA Framework. Both are designed to work together, although Oryzer Studio does not require VA Framework to function and is freely available to anyone who wanted to look at it. I thank you in advance for any assistance you can provide submitted by /u/Normmstein [link] [comments]  ( 1 min )
    How to Make Slow Motion Videos With AI ! TimeLens Explained
    submitted by /u/OnlyProggingForFun [link] [comments]
  • Open

    [D] What is the best machine learning algorithm to classify seemingly random data?
    I am a student Computer science. For my bachelor's internship I have to research the possibillity of predicting whether a new client will buy a house or not based on their website-activity/behaviour so the real-estate company can prioritise the most likely buyers. The real-estate company assured me they had loads of data but nope.... . I have data of about 1400 clients of which 150 bought a house. When we learn about new algorithms we usually have a perfect dataset to work with but this is the real world where everything is chaos. I've tried KNN, SVM, Gaussian mixture and XGboost. But they all underperform because there is very little data and all of it is seemingly random. So What is the best machine learning algorithm to classify seemingly random data? Or is it impossible to get any reasonable prediction out of it? For my research it is however perfectly acceptable to conclude it to be impossible to predict anything given these circumstances. Thanks in advance! submitted by /u/Fine-Coffee6171 [link] [comments]  ( 2 min )
    [D] Any recommendations for image annotation software .
    I've been using Supervisely Enterprise in work and it's been going really really well, but my work has been paying for it. Meanwhile, my grad research work is requiring a large-scale annotation effort and my advisor is queasy about forking over grant money just to annotate. We can't use the community version because HIPAA. I've been looking at other free annotation software and I'm trying to make a decision on‍ which to use. Any suggestions on what has worked for you? submitted by /u/aygupt1822 [link] [comments]  ( 1 min )
    [D] I’m writing a msc thesis on synthetic data for image classification. I am wondering which kinds of data visualization techniques (of real data)/EDA I could add.
    The task is recognition of jersey numbers on the back of sport players. The dataset contains the picture, the label and a bounding box containing the jersey number. The only ideas I had was plotting the distribution of the bounding box in the image, and the histogram of labels. I am not sure if it makes sense to plot the distribution of image resolutions. submitted by /u/TheManveru [link] [comments]  ( 2 min )
    [P] Nftopia.ai - Visual semantic search for NFTs
    Hi! My colleagues & I indexed about 1.1 million NFT images from across 21 thousand collections using OpenAI's CLIP and put it behind a website. Please let us know what you think! Specifically, we embed all these images using clip & expose two functionalities. The search bar embeds the input query & retrieves similar images. The "+more like this" is image search that retrieves other NFTs similar to it in the image space. We use Pinecone to power the approximate nearest neighbor search. The web stack uses Django in the front + flask app that interfaces with Pinecone. (Warning: we've noticed several NSFW NFTs that can pop up occasionally even for unrelated search queries) submitted by /u/hayAbhay [link] [comments]  ( 1 min )
    [N] Awesome AI R&D content (with code!) on Computer Vision News of May 2022
    Dear all, Awesome R&D content (with code!) on Computer Vision News of May 2022. Many great articles about AI, Deep Learning, Computer Vision and more... Review of award-winning ISBI 2022 paper. HTML5 version (recommended) PDF version Dilbert on page 2. Free subscription on page 60. Enjoy! https://preview.redd.it/cdg07kpfnuy81.jpg?width=400&format=pjpg&auto=webp&s=9ff68e58648a0dafc95b9416bc1a7e194ab1b898 submitted by /u/Gletta [link] [comments]  ( 1 min )
    [D] Can you recommend this book? Analytical Skills for AI and Data Science
    Recently, I am stumbling frequently across this book: https://www.oreilly.com/library/view/analytical-skills-for/9781492060932/. I am thinking about buying it. Are here some people who own this book and can give some recommendations? submitted by /u/rightkill [link] [comments]  ( 1 min )
    [P] Recognize similar nodes in chronologically evolving graph
    I’ve collected a data set of around 5 million unique ids which transact values with each other in one directional ways at some point of time. I understand that this can easily be visualized as a graph where each node is an id and interaction is a weighted edge assigned a tuple (value, time). I now want to build a model that finds similar behaving nodes. The goal is to look at the current state of the graph and use the model to predict what’s likely to happen with nodes that have joined the graph relatively late (=first transaction isnt old) Im sure there has been work done that can help me find hints how to do this. Are there papers/models you would you recommend to look at? submitted by /u/SomewhereOld6859 [link] [comments]  ( 1 min )
    [R] [CVPR 2022 Oral] Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation
    We present an end-to-end deep view aggregation method for 3D semantic segmentation from images and point clouds. We reach SOTA on S3DIS and KITTI360 without requiring point cloud colorization, meshing, or depth sensors: just point clouds, images, and their poses. preprint | code | paperwithcode submitted by /u/grad_student_descent [link] [comments]  ( 1 min )
    [P] I wrote an article on Uplift Modeling to target the "persuadables"
    Hey everyone! As the title says, I recently wrote a technical article where I built an uplift model to increase marketing ROIs by targeting the right group of people (the persuadables). Here's the link: https://towardsdatascience.com/targeting-the-right-group-with-uplift-modelling-5682de2dff8b Curious to know if anyone has successfully used uplift modeling in their industry or field? And let me know what you think. Any feedback is greatly appreciated! submitted by /u/ramanansubramanian [link] [comments]  ( 1 min )
    [D] Is there a tool to visualise my neural network in real time?
    I would to know if there are any tool that I can use in order to visualise my NN in real time noting that I use keras. I want something like shown in this video screenshot : ​ https://preview.redd.it/yut97aa7ysy81.png?width=1969&format=png&auto=webp&s=6704667a483810d1614c90075335faa75e1ec0e7 submitted by /u/aymenboufe [link] [comments]  ( 1 min )
    [D] Is there any way to artificially create a probability calibration for data coming from another model?
    I have probability predictions, which come from a survival model, this model gives me very low probabilities, and I am not sure if they fulfill the real probability of the phenomenon. For example, I calculate P(T≤t+d|T>t) and the probabilities are very low (with d=180). To summarize, I need these probabilities to be on average another number (let's say 0.2). Is it possible to create an artificial calibration with only this number (the desired average) as the input? I have thought of creating a vector of size n equal to the size of the original dataset with distribution Xi∼Bernoulli(p=0.2) and assign its ones to the top np probabilities and its zeros to the latest n(1−p). Which would result in a table with a column of probabilities obtained with the survival model and another column with a 0 or 1 depending on the said probability. After getting this table, I would simply use CalibratedClassifierCV from scikit-learn. Could this be the correct way? submitted by /u/Jhones_edlc [link] [comments]  ( 1 min )
    [D] Fastest way to calculate distance (drift) between vectors - at scale (billions)
    I'm looking for the fastest way to calculate distance (drift) between vectors - at scale (billions). What do I mean? For example, there are two sets of 1 billions vectors, each. I want to know how far/different one of the sets is from the other. Each vector has 768 dimensions. submitted by /u/igaloly [link] [comments]  ( 2 min )
    [R][D] Cluster Sentences
    I have a use-case where I need to cluster/combine related sentences together. For example : Sentences ​ https://preview.redd.it/suq97dy5bsy81.png?width=683&format=png&auto=webp&s=7d2a3168e755a8fb390e384481f4a7e37a84966f Related sentences together ​ https://preview.redd.it/absdu995bsy81.png?width=683&format=png&auto=webp&s=5b9285b3b0d1203e3e50457447a1086cae9c821c Constraint: We do not know the number of clusters beforehand (K-Means cluster is not much useful here) ​ The following does not give satisfactory solution https://ntropy.com/post/clustering-millions-of-sentences-to-optimize-the-ml-workflow Can anyone please help with the approach/pointers? submitted by /u/Expert-Departure-236 [link] [comments]
    [R] Full-batch GD generalizes better than SGD
    This paper https://arxiv.org/abs/2204.12446 shows the dependence of generalization error with respect to the optimization error. For smooth losses ( examples : log sum exp, or smoothed leaky relu activation functions), full-batch GD generalizes better than SGD (or at least compared to known generalization error bounds). Additionally, in the over-parametrized regime (exact fit) the Polyak-Lojasiewicz condition holds (https://arxiv.org/abs/2003.00307,) and T= C log(n) iterations are required to achieve generalization and excess risk of the order 1/n^2. In practice, this should be translated also as "train longer, to generalize better" and "increase the batch size to generalize better". See also the tables of the results below. https://preview.redd.it/6ska4l52mry81.jpg?width=1165&format=pjpg&auto=webp&s=222043c8152761f89c6855fd26cba0b4f88443e7 https://preview.redd.it/k038am52mry81.jpg?width=1184&format=pjpg&auto=webp&s=6155f8a5022199dec0c0c3341a962d05825cc867 submitted by /u/chaotic_shadow4444 [link] [comments]  ( 4 min )
    [P] Music mixing and AI
    I was wondering what would be the best approach to develop an AI agent that automatically mix music with sound and mix effect i.e. artificial DJ ? submitted by /u/big_dataFitness [link] [comments]  ( 1 min )
  • Open

    Animo Island, a PC game that empowers players to explore reinforcement learning as a game mechanic
    Transitional Forms presents to you, Animo Island, a rogue-lite base builder that utilizes the power of reinforcement learning to enable YOU to perform and explore complex behaviours from training cute little agents called Animo! Through the process of rewarding and training your Animo, you can teach it to perform actions on the island from which it can learn to make better decisions over time. You can customize and train your Animo to gather crystals, pick apples, plant flowers...sky’s the limit as you explore new terrain and emergent behaviours! Animo Island is teeming with potential for you to create an ever changing and autonomous paradise for your Animo’s to cultivate and grow. Through cute, loveable characters and playful environments, Animo Island invites you to create empathetic connections with creative machine intelligence in a way that has never been done before. Join our Discord community to learn more about Animo Island and get all the latest updates and information! We are currently looking for passionate folks to be play testers for our upcoming beta release, make sure to drop us a line in Discord if you’re interested! Link to Discord: https://discord.gg/rkSXhKyDVR Subreddit: r/AnimoIsland submitted by /u/AnimoIsland [link] [comments]  ( 1 min )
    Basic question about importance sampling for off-policy Montecarlo Prediction.
    submitted by /u/jamespherman [link] [comments]  ( 1 min )
    How to set up a Gym environment for a two player game like Nine Men's Morris
    I'm working on an RL agent which plays the game Nine Men's Morris against an human player (and hopefully wins most of the time). At least that is the goal. Right now I'm struggling with the game logic itself and setting up an appropriate environment with gym. Question 1: If an action closes a mill, the same player has to take another action (remove one of the opponents pieces) - how do I set this up in the step()-Function? Currently, I have two ideas: Inside the step function require another action from the agent or change the action space so that every action contains an additional parameter of where to take a piece in case of a mill. The latter seems cleaner but also increases my action space further ... Question 2: Observation is a 7 x 7 player board where a 1 marks player 1, a -1 marks player 2 and zero is an empty field. There are 600 possible actions (place piece on one of 24 legal points, take a piece from one of 24 legal points, move a piece from 24 * 24 - 24 points). Is there any way to reduce the action space? Is 600 too large? Should my agent know what actions are legal or should it learn this? Question 3: My current understanding of gym is that the entire game logic is behind the step(action) function. If the agent chooses an illegal action, the game is immediately terminated and the agent receives the lose reward. Is this correct? Question 4: My first idea at training the agent is the invert the player board after each turn - so the agent plays against itself. I've read that this is probably unstable and won't produce great results, but as a first step it would probably be enough. Another idea is to play against an older version of the agent so that it gets better over time. Do these ideas seem reasonable? Furthemore, any resource (blog article, paper, tutorial ...) on this topic (using RL on two player games) would be greatly appreciated. submitted by /u/house_92 [link] [comments]  ( 2 min )
    "Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers", Chan et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Changing the observation space from real valued quantities to visual obs
    I would like to change the observation of the MAP environment. If you look at the code (https://github.com/openai/multiagent-particle-envs/blob/47e9ee38e605f8a563370b3c7e52a349eca3f6b1/multiagent/environment.py#L69), this is how it is initialized: self.observation_space.append(spaces.Box(low=-np.inf, high=+np.inf, shape=(obs_dim,), dtype=np.float32)) Then, if you look at one of the actual envs, for example simple spread, this (https://github.com/openai/multiagent-particle-envs/blob/47e9ee38e605f8a563370b3c7e52a349eca3f6b1/multiagent/scenarios/simple_spread.py#L100) is what at each step the env gives as an observation to the agent: return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + comm) Is there a way I can add an image of the environment to this np array? I would like my agent to also receive a visual obs. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    On the Verge of Solving Rocket League using Deep Reinforcement Learning and Sim-to-sim Transfer
    Paper: https://arxiv.org/abs/2205.05061 Videos: https://www.youtube.com/watch?v=8k9FNxIU0KQ Github: Coming soon Playlist: https://www.youtube.com/watch?v=WXMHJszkz6M&list=PL2KGNY2Ei3ix7Vr_vA-ZgCyVfOCfhbX0C submitted by /u/LilHairdy [link] [comments]  ( 1 min )
    Papers explaining the limitations of Q-learning and the deep Q-network
    Hi! I am currently working on my master thesis that includes custom-made environments for Q-learning and a deep Q-network from Mnih et al. (2015). One of the limitations of my custom-made environment is that the state and action space increases rapidly. For the Q-learning solution, the size of the Q-table becomes enormous with millions to billions of possible states, while the deep Q-network suffers from catastrophic forgetting. Furthermore, the size of the replay memory (1 million experiences) is not enough to efficiently deal with the rapid increase in state space. I know that "Implementing the Deep Q-Network" from Melrose Roderick, James MacGlashan, and Stefanie Tellex is a good paper explaining the limitations of DQN. My question is, does anyone know of more good papers like this? I am also interested in papers explaining the limitations of Q-learning. Thank you for any help! submitted by /u/Sondreeo [link] [comments]  ( 1 min )
  • Open

    Language Models Perform Reasoning via Chain of Thought
    Posted by Jason Wei and Denny Zhou, Research Scientists, Google Research, Brain team In recent years, scaling up the size of language models has been shown to be a reliable way to improve performance on a range of natural language processing (NLP) tasks. Today’s language models at the scale of 100B or more parameters achieve strong performance on tasks like sentiment analysis and machine translation, even with little or no training examples. Even the largest language models, however, can still struggle with certain multi-step reasoning tasks, such as math word problems and commonsense reasoning. How might we enable language models to perform such reasoning tasks? In “Chain of Thought Prompting Elicits Reasoning in Large Language Models,” we explore a prompting method for improving the re…  ( 6 min )
    Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate
    Posted by Isaac Caswell and Ankur Bapna, Research Scientists, Google Translate Machine translation (MT) technology has made significant advances in recent years, as deep learning has been integrated with natural language processing (NLP). Performance on research benchmarks like WMT have soared, and translation services have improved in quality and expanded to include new languages. Nevertheless, while existing translation services cover languages spoken by the majority of people world wide, they only include around 100 languages in total, just over 1% of those actively spoken globally. Moreover, the languages that are currently represented are overwhelmingly European, largely overlooking regions of high linguistic diversity, like Africa and the Americas. There are two key bottlenecks t…  ( 9 min )
  • Open

    Achieve in-vehicle comfort using personalized machine learning and Amazon SageMaker
    This blog post is co-written by Rudra Hota and Esaias Pech from Continental AG. Many drivers have had the experience of trying to adjust temperature settings in their vehicle while attempting to keep their eyes on the road. Whether the previous driver preferred a warmer cabin temperature, or you’re now wearing warmer clothing, or the […]  ( 8 min )
  • Open

    New Twitter account: ElementFact
    I started a new Twitter account this morning: @ElementFact. I’m thinking the account will post things like scientific facts about each element but also some history around how the element was discovered and named and other lore associated with the element. We’ll see how this goes. I’ve started many Twitter accounts over the years. Some […] New Twitter account: ElementFact first appeared on John D. Cook.  ( 1 min )
  • Open

    Transfer Learning — Part — 6.1!! Implementing Mobilenet in Keras
    In Part 6.0 of the Transfer Learning series we have discussed about Mobilenet pre-trained model in depth so in this series we will…  ( 89 min )
  • Open

    DSC Weekly Newsletter 10 May 2022: Data Meshes, Digital Twins, and Knowledge Graphs
    There’s an underlying theme to this week’s articles, which is a curious occurrence given that so much of our content is user-driven. That theme is the Value of Data. There is a tendency when looking at data in its various incarnations to view all data as somehow being valuable. Realistically, without rolling up sleeves and… Read More »DSC Weekly Newsletter 10 May 2022: Data Meshes, Digital Twins, and Knowledge Graphs The post DSC Weekly Newsletter 10 May 2022: Data Meshes, Digital Twins, and Knowledge Graphs appeared first on Data Science Central.  ( 3 min )
  • Open

    Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System. (arXiv:2201.12604v2 [cs.LG] UPDATED)
    Humans excel at continually learning from an ever-changing environment whereas it remains a challenge for deep neural networks which exhibit catastrophic forgetting. The complementary learning system (CLS) theory suggests that the interplay between rapid instance-based learning and slow structured learning in the brain is crucial for accumulating and retaining knowledge. Here, we propose CLS-ER, a novel dual memory experience replay (ER) method which maintains short-term and long-term semantic memories that interact with the episodic memory. Our method employs an effective replay mechanism whereby new knowledge is acquired while aligning the decision boundaries with the semantic memories. CLS-ER does not utilize the task boundaries or make any assumption about the distribution of the data which makes it versatile and suited for "general continual learning". Our approach achieves state-of-the-art performance on standard benchmarks as well as more realistic general continual learning settings.  ( 2 min )
    Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology. (arXiv:2204.08311v2 [cs.CV] UPDATED)
    Background: Breast cancer has the highest prevalence in women globally. The classification and diagnosis of breast cancer and its histopathological images have always been a hot spot of clinical concern. In Computer-Aided Diagnosis (CAD), traditional classification models mostly use a single network to extract features, which has significant limitations. On the other hand, many networks are trained and optimized on patient-level datasets, ignoring the application of lower-level data labels. Method: This paper proposes a deep ensemble model based on image-level labels for the binary classification of benign and malignant lesions of breast histopathological images. First, the BreaKHis dataset is randomly divided into a training, validation and test set. Then, data augmentation techniques are used to balance the number of benign and malignant samples. Thirdly, considering the performance of transfer learning and the complementarity between each network, VGG16, Xception, ResNet50, DenseNet201 are selected as the base classifiers. Result: In the ensemble network model with accuracy as the weight, the image-level binary classification achieves an accuracy of $98.90\%$. In order to verify the capabilities of our method, the latest Transformer and Multilayer Perception (MLP) models have been experimentally compared on the same dataset. Our model wins with a $5\%-20\%$ advantage, emphasizing the ensemble model's far-reaching significance in classification tasks. Conclusion: This research focuses on improving the model's classification performance with an ensemble algorithm. Transfer learning plays an essential role in small datasets, improving training speed and accuracy. Our model has outperformed many existing approaches in accuracy, providing a method for the field of auxiliary medical diagnosis.  ( 2 min )
    A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data. (arXiv:2201.12020v2 [stat.ML] UPDATED)
    This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (EM) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or follows non-Gaussian distributions. To overcome this issue, a new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of Angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the EM framework with missing data thanks to its conditional distribution, which is shown to be a multivariate $t$-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.  ( 2 min )
    A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks. (arXiv:2205.05040v1 [cs.LG])
    In distributed training of deep neural networks or Federated Learning (FL), people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the FL setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup. The main technical difficulty lies in dealing with nonconvex loss function, non-Lipschitz continuous gradient, and skipping communication rounds simultaneously. In this paper, we explore a relaxed-smoothness assumption of the loss landscape which LSTM was shown to satisfy in previous works and design a communication-efficient gradient clipping algorithm. This algorithm can be run on multiple machines, where each machine employs a gradient clipping scheme and communicate with other machines after multiple steps of gradient-based updates. Our algorithm is proved to have $O\left(\frac{1}{N\epsilon^4}\right)$ iteration complexity for finding an $\epsilon$-stationary point, where $N$ is the number of machines. This indicates that our algorithm enjoys linear speedup. We prove this result by introducing novel analysis techniques of estimating truncated random variables, which we believe are of independent interest. Our experiments on several benchmark datasets and various scenarios demonstrate that our algorithm indeed exhibits fast convergence speed in practice and thus validates our theory.  ( 2 min )
    NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality. (arXiv:2205.04421v2 [eess.AS] UPDATED)
    Text to speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level quality, how to define/judge that quality and how to achieve it. In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation, with several key modules to enhance the capacity of the prior from text and reduce the complexity of the posterior from speech, including phoneme pre-training, differentiable duration modeling, bidirectional prior/posterior modeling, and a memory mechanism in VAE. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS (comparative mean opinion score) to human recordings at the sentence level, with Wilcoxon signed rank test at p-level p >> 0.05, which demonstrates no statistically significant difference from human recordings for the first time on this dataset.  ( 2 min )
    Engineering flexible machine learning systems by traversing functionally invariant paths in weight space. (arXiv:2205.00334v2 [cs.LG] UPDATED)
    Deep neural networks achieve human-like performance on a variety of perceptual and decision making tasks. However, deep networks perform poorly when confronted with changing tasks or goals, and broadly fail to match the flexibility and robustness of human intelligence. Here, we develop a mathematical and algorithmic framework that enables continual training of deep neural networks on a broad range of objectives by defining path connected sets of neural networks that achieve equivalent functional performance on a given machine learning task while modulating network weights to achieve high-performance on a secondary objective. We view the weight space of a neural network as a curved Riemannian manifold and move a neural network along a functionally invariant path in weight space while searching for networks that satisfy a secondary objective. We introduce a path-sampling algorithm that trains networks with millions of weight parameters to learn a series of image classification tasks without performance loss. The algorithm generalizes to accommodate a range of secondary objectives including weight-pruning and weight diversification and exhibits state of the art performance on network compression and adversarial robustness benchmarks. Broadly, we demonstrate how the intrinsic geometry of machine learning problems can be harnessed to construct flexible and robust neural networks.  ( 2 min )
    Deep Learning Based Cloud Cover Parameterization for ICON. (arXiv:2112.11317v2 [physics.ao-ph] UPDATED)
    A promising approach to improve cloud parameterizations within climate models and thus climate projections is to use deep learning in combination with training data from storm-resolving model (SRM) simulations. The ICOsahedral Non-hydrostatic (ICON) modeling framework permits simulations ranging from numerical weather prediction to climate projections, making it an ideal target to develop neural network (NN) based parameterizations for sub-grid scale processes. Within the ICON framework, we train NN based cloud cover parameterizations with coarse-grained data based on realistic regional and global ICON SRM simulations. We set up three different types of NNs that differ in the degree of vertical locality they assume for diagnosing cloud cover from coarse-grained atmospheric state variables. The NNs accurately estimate sub-grid scale cloud cover from coarse-grained data that has similar geographical characteristics as their training data. Additionally, globally trained NNs can reproduce sub-grid scale cloud cover of the regional SRM simulation. Using the game-theory based interpretability library SHapley Additive exPlanations, we identify an overemphasis on specific humidity and cloud ice as the reason why our column-based NN cannot perfectly generalize from the global to the regional coarse-grained SRM data. The interpretability tool also helps visualize similarities and differences in feature importance between regionally and globally trained column-based NNs, and reveals a local relationship between their cloud cover predictions and the thermodynamic environment. Our results show the potential of deep learning to derive accurate yet interpretable cloud cover parameterizations from global SRMs, and suggest that neighborhood-based models may be a good compromise between accuracy and generalizability.  ( 2 min )
    SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures. (arXiv:2205.04711v1 [cs.AR])
    Graph neural networks (GNNs) can extract features by learning both the representation of each objects (i.e., graph nodes) and the relationship across different objects (i.e., the edges that connect nodes), achieving state-of-the-art performance in various graph-based tasks. Despite its strengths, utilizing these algorithms in a production environment faces several challenges as the number of graph nodes and edges amount to several billions to hundreds of billions scale, requiring substantial storage space for training. Unfortunately, state-of-the-art ML frameworks employ an in-memory processing model which significantly hampers the productivity of ML practitioners as it mandates the overall working set to fit within DRAM capacity. In this work, we first conduct a detailed characterization on a state-of-the-art, large-scale GNN training algorithm, GraphSAGE. Based on the characterization, we then explore the feasibility of utilizing capacity-optimized NVM SSDs for storing memory-hungry GNN data, which enables large-scale GNN training beyond the limits of main memory size. Given the large performance gap between DRAM and SSD, however, blindly utilizing SSDs as a direct substitute for DRAM leads to significant performance loss. We therefore develop SmartSAGE, our software/hardware co-design based on an in-storage processing (ISP) architecture. Our work demonstrates that an ISP based large-scale GNN training system can achieve both high capacity storage and high performance, opening up opportunities for ML practitioners to train large GNN datasets without being hampered by the physical limitations of main memory size.  ( 2 min )
    An Algorithmic Framework for Bias Bounties. (arXiv:2201.10408v4 [cs.LG] UPDATED)
    We propose and analyze an algorithmic framework for "bias bounties": events in which external participants are invited to propose improvements to a trained model, akin to bug bounty events in software and security. Our framework allows participants to submit arbitrary subgroup improvements, which are then algorithmically incorporated into an updated model. Our algorithm has the property that there is no tension between overall and subgroup accuracies, nor between different subgroup accuracies, and it enjoys provable convergence to either the Bayes optimal model or a state in which no further improvements can be found by the participants. We provide formal analyses of our framework, experimental evaluation, and findings from a preliminary bias bounty event.  ( 2 min )
    RLFlow: Optimising Neural Network Subgraph Transformation with World Models. (arXiv:2205.01435v2 [cs.LG] UPDATED)
    Training deep learning models takes an extremely long execution time and consumes large amounts of computing resources. At the same time, recent research proposed systems and compilers that are expected to decrease deep learning models runtime. An effective optimisation methodology in data processing is desirable, and the reduction of compute requirements of deep learning models is the focus of extensive research. In this paper, we address the neural network sub-graph transformation by exploring reinforcement learning (RL) agents to achieve performance improvement. Our proposed approach RLFlow can learn to perform neural network subgraph transformations, without the need for expertly designed heuristics to achieve a high level of performance. Recent work has aimed at applying RL to computer systems with some success, especially using model-free RL techniques. Model-based reinforcement learning methods have seen an increased focus in research as they can be used to learn the transition dynamics of the environment; this can be leveraged to train an agent using a hallucinogenic environment such as World Model (WM), thereby increasing sample efficiency compared to model-free approaches. WM uses variational auto-encoders and it builds a model of the system and allows exploring the model in an inexpensive way. In RLFlow, we propose a design for a model-based agent with WM which learns to optimise the architecture of neural networks by performing a sequence of sub-graph transformations to reduce model runtime. We show that our approach can match the state-of-the-art performance on common convolutional networks and outperforms by up to 5% those based on transformer-style architectures  ( 2 min )
    Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards. (arXiv:2205.04702v1 [cs.AR])
    Personalized recommendation models (RecSys) are one of the most popular machine learning workload serviced by hyperscalers. A critical challenge of training RecSys is its high memory capacity requirements, reaching hundreds of GBs to TBs of model size. In RecSys, the so-called embedding layers account for the majority of memory usage so current systems employ a hybrid CPU-GPU design to have the large CPU memory store the memory hungry embedding layers. Unfortunately, training embeddings involve several memory bandwidth intensive operations which is at odds with the slow CPU memory, causing performance overheads. Prior work proposed to cache frequently accessed embeddings inside GPU memory as means to filter down the embedding layer traffic to CPU memory, but this paper observes several limitations with such cache design. In this work, we present a fundamentally different approach in designing embedding caches for RecSys. Our proposed ScratchPipe architecture utilizes unique properties of RecSys training to develop an embedding cache that not only sees the past but also the "future" cache accesses. ScratchPipe exploits such property to guarantee that the active working set of embedding layers can "always" be captured inside our proposed cache design, enabling embedding layer training to be conducted at GPU memory speed.  ( 2 min )
    Cognitive Visual-learning Environment for PostgreSQL. (arXiv:2205.04834v1 [cs.LG])
    PostgreSQL is an object-relational database (ORDBMS) that was introduced into the database community and has been avidly used for a variety of information extraction use cases. It is also known to be an advanced SQL-compliant open source Object RDBMS. However, users have not yet resolved to PostgreSQL due to the fact that it is still under the layers and the complexity of its persistent textual environment for an amateur user. Hence, there is a dire need to provide an easy environment for users to comprehend the procedure and standards with which databases are created, tables and the relationships among them, manipulating queries and their flow based on conditions in PostgreSQL. As such, this project identifies the dominant features offered by Postgresql, analyzes the constraints that exist in the database user community in migrating to PostgreSQL and based on the scope and constraints identified, develop a system that will serve as a query generation platform as well as a learning tool that will provide an interactive environment to cognitively learn PostgreSQL query building. This is achieved using a visual editor incorporating a textual editor for a well-versed user. By providing visually-draggable query components to work with, this research aims to offer a cognitive, visual and tactile environment where users can interactively learn PostgreSQL query generation.  ( 2 min )
    Learning Relative Return Policies With Upside-Down Reinforcement Learning. (arXiv:2202.12742v2 [cs.LG] UPDATED)
    Lately, there has been a resurgence of interest in using supervised learning to solve reinforcement learning problems. Recent work in this area has largely focused on learning command-conditioned policies. We investigate the potential of one such method -- upside-down reinforcement learning -- to work with commands that specify a desired relationship between some scalar value and the observed return. We show that upside-down reinforcement learning can learn to carry out such commands online in a tabular bandit setting and in CartPole with non-linear function approximation. By doing so, we demonstrate the power of this family of methods and open the way for their practical use under more complicated command structures.  ( 2 min )
    Federated Random Reshuffling with Compression and Variance Reduction. (arXiv:2205.03914v2 [cs.LG] UPDATED)
    Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the compression parameter, and applying additional controlled linear perturbations for Random Reshuffling, introduced by Malinovsky et al.(2021) helps to eliminate variance at the optimum. We provide the first analysis of compressed local methods under standard assumptions without bounded gradient assumptions and for heterogeneous data, overcoming the limitations of the compression operator. We corroborate our theoretical results with experiments on synthetic and real data sets.
    Automatic Sleep Staging of EEG Signals: Recent Development, Challenges, and Future Directions. (arXiv:2111.08446v3 [eess.SP] UPDATED)
    Modern deep learning holds a great potential to transform clinical practice on human sleep. Teaching a machine to carry out routine tasks would be a tremendous reduction in workload for clinicians. Sleep staging, a fundamental step in sleep practice, is a suitable task for this and will be the focus in this article. Recently, automatic sleep staging systems have been trained to mimic manual scoring, leading to similar performance to human sleep experts, at least on scoring of healthy subjects. Despite tremendous progress, we have not seen automatic sleep scoring adopted widely in clinical environments. This review aims to give a shared view of the authors on the most recent state-of-the-art development in automatic sleep staging, the challenges that still need to be addressed, and the future directions for automatic sleep scoring to achieve clinical value.
    Semantic features of object concepts generated with GPT-3. (arXiv:2202.03753v2 [cs.CL] UPDATED)
    Semantic features have been playing a central role in investigating the nature of our conceptual representations. Yet the enormous time and effort required to empirically sample and norm features from human raters has restricted their use to a limited set of manually curated concepts. Given recent promising developments with transformer-based language models, here we asked whether it was possible to use such models to automatically generate meaningful lists of properties for arbitrary object concepts and whether these models would produce features similar to those found in humans. To this end, we probed a GPT-3 model to generate semantic features for 1,854 objects and compared automatically-generated features to existing human feature norms. GPT-3 generated many more features than humans, yet showed a similar distribution in the types of generated features. Generated feature norms rivaled human norms in predicting similarity, relatedness, and category membership, while variance partitioning demonstrated that these predictions were driven by similar variance in humans and GPT-3. Together, these results highlight the potential of large language models to capture important facets of human knowledge and yield a new approach for automatically generating interpretable feature sets, thus drastically expanding the potential use of semantic features in psychological and linguistic studies.
    Category-orthogonal object features guide information processing in recurrent neural networks trained for object categorization. (arXiv:2111.07898v2 [cs.CV] UPDATED)
    Recurrent neural networks (RNNs) have been shown to perform better than feedforward architectures in visual object categorization tasks, especially in challenging conditions such as cluttered images. However, little is known about the exact computational role of recurrent information flow in these conditions. Here we test RNNs trained for object categorization on the hypothesis that recurrence iteratively aids object categorization via the communication of category-orthogonal auxiliary variables (the location, orientation, and scale of the object). Using diagnostic linear readouts, we find that: (a) information about auxiliary variables increases across time in all network layers, (b) this information is indeed present in the recurrent information flow, and (c) its manipulation significantly affects task performance. These observations confirm the hypothesis that category-orthogonal auxiliary variable information is conveyed through recurrent connectivity and is used to optimize category inference in cluttered environments.
    Tight Last-Iterate Convergence of the Extragradient and the Optimistic Gradient Descent-Ascent Algorithm for Constrained Monotone Variational Inequalities. (arXiv:2204.09228v2 [math.OC] UPDATED)
    The monotone variational inequality is a central problem in mathematical programming that unifies and generalizes many important settings such as smooth convex optimization, two-player zero-sum games, convex-concave saddle point problems, etc. The extragradient algorithm by Korpelevich [1976] and the optimistic gradient descent-ascent algorithm by Popov [1980] are arguably the two most classical and popular methods for solving monotone variational inequalities. Despite its long history, the following major problem remains open. What is the last-iterate convergence rate of the extragradient algorithm or the optimistic gradient descent-ascent algorithm for monotone and Lipschitz variational inequalities with constraints? We resolve this open problem by showing that both the extragradient algorithm and the optimistic gradient descent-ascent algorithm have a tight $O\left(\frac{1}{\sqrt{T}}\right)$ last-iterate convergence rate for arbitrary convex feasible sets, which matches the lower bound by Golowich et al. [2020a, b]. Our rate is measured in terms of the standard gap function. At the core of our results lies a new performance measure -- the tangent residual, which can be viewed as an adaptation of the norm of the operator that takes the local constraints into account. We use the tangent residual (or a slight variation of the tangent residual) as the performance measure in our analysis of the extragradient algorithm (or the optimistic gradient descent-ascent algorithm). To establish the monotonicity of these performance measures, we develop a new approach that combines the power of the sum-of-squares programming with the low dimensionality of the update rule of the extragradient or the optimistic gradient descent-ascent algorithm. We believe our approach has many additional applications in the analysis of iterative methods.
    Labeled sample compression schemes for complexes of oriented matroids. (arXiv:2110.15168v2 [math.CO] UPDATED)
    We show that the topes of a complex of oriented matroids (abbreviated COM) of VC-dimension $d$ admit a proper labeled sample compression scheme of size $d$. This considerably extends results of Moran and Warmuth on ample classes, of Ben-David and Litman on affine arrangements of hyperplanes, and of the authors on complexes of uniform oriented matroids, and is a step towards the sample compression conjecture -- one of the oldest open problems in computational learning theory. On the one hand, our approach exploits the rich combinatorial cell structure of COMs via oriented matroid theory. On the other hand, viewing tope graphs of COMs as partial cubes creates a fruitful link to metric graph theory.
    Memory-Efficient Convex Optimization for Self-Dictionary Separable Nonnegative Matrix Factorization: A Frank-Wolfe Approach. (arXiv:2109.11135v2 [eess.SP] UPDATED)
    Nonnegative matrix factorization (NMF) often relies on the separability condition for tractable algorithm design. Separability-based NMF is mainly handled by two types of approaches, namely, greedy pursuit and convex programming. A notable convex NMF formulation is the so-called self-dictionary multiple measurement vectors (SD-MMV), which can work without knowing the matrix rank a priori, and is arguably more resilient to error propagation relative to greedy pursuit. However, convex SD-MMV renders a large memory cost that scales quadratically with the problem size. This memory challenge has been around for a decade, and a major obstacle for applying convex SD-MMV to big data analytics. This work proposes a memory-efficient algorithm for convex SD-MMV. Our algorithm capitalizes on the special update rules of a classic algorithm from the 1950s, namely, the Frank-Wolfe (FW) algorithm. It is shown that, under reasonable conditions, the FW algorithm solves the noisy SD-MMV problem with a memory cost that grows linearly with the amount of data. To handle noisier scenarios, a smoothed group sparsity regularizer is proposed to improve robustness while maintaining the low memory footprint with guarantees. The proposed approach presents the first linear memory complexity algorithmic framework for convex SD-MMV based NMF. The method is tested over a couple of unsupervised learning tasks, i.e., text mining and community detection, to showcase its effectiveness and memory efficiency.
    Text Simplification by Tagging. (arXiv:2103.05070v1 [cs.CL] CROSS LISTED)
    Edit-based approaches have recently shown promising results on multiple monolingual sequence transduction tasks. In contrast to conventional sequence-to-sequence (Seq2Seq) models, which learn to generate text from scratch as they are trained on parallel corpora, these methods have proven to be much more effective since they are able to learn to make fast and accurate transformations while leveraging powerful pre-trained language models. Inspired by these ideas, we present TST, a simple and efficient Text Simplification system based on sequence Tagging, leveraging pre-trained Transformer-based encoders. Our system makes simplistic data augmentations and tweaks in training and inference on a pre-existing system, which makes it less reliant on large amounts of parallel training data, provides more control over the outputs and enables faster inference speeds. Our best model achieves near state-of-the-art performance on benchmark test datasets for the task. Since it is fully non-autoregressive, it achieves faster inference speeds by over 11 times than the current state-of-the-art text simplification system.
    VOS: Learning What You Don't Know by Virtual Outlier Synthesis. (arXiv:2202.01197v4 [cs.LG] UPDATED)
    Out-of-distribution (OOD) detection has received much attention lately due to its importance in the safe deployment of neural networks. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Previous approaches rely on real outlier datasets for model regularization, which can be costly and sometimes infeasible to obtain in practice. In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training. Specifically, VOS samples virtual outliers from the low-likelihood region of the class-conditional distribution estimated in the feature space. Alongside, we introduce a novel unknown-aware training objective, which contrastively shapes the uncertainty space between the ID data and synthesized outlier data. VOS achieves competitive performance on both object detection and image classification models, reducing the FPR95 by up to 9.36% compared to the previous best method on object detectors. Code is available at https://github.com/deeplearning-wisc/vos.
    Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting. (arXiv:2010.04456v6 [stat.ML] UPDATED)
    Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framework, a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model. The learning problem is carefully formulated such that the physical model explains as much of the data as possible, while the data-driven component only describes information that cannot be captured by the physical model, no more, no less. This not only provides the existence and uniqueness for this decomposition, but also ensures interpretability and benefits generalization. Experiments made on three important use cases, each representative of a different family of phenomena, i.e. reaction-diffusion equations, wave equations and the non-linear damped pendulum, show that APHYNITY can efficiently leverage approximate physical models to accurately forecast the evolution of the system and correctly identify relevant physical parameters. Code is available at https://github.com/yuan-yin/APHYNITY .
    Human-robot collaboration and machine learning: a systematic review of recent research. (arXiv:2110.07448v3 [cs.RO] UPDATED)
    Technological progress increasingly envisions the use of robots interacting with people in everyday life. Human-robot collaboration (HRC) is the approach that explores the interaction between a human and a robot, during the completion of a common objective, at the cognitive and physical level. In HRC works, a cognitive model is typically built, which collects inputs from the environment and from the user, elaborates and translates these into information that can be used by the robot itself. Machine learning is a recent approach to build the cognitive model and behavioural block, with high potential in HRC. Consequently, this paper proposes a thorough literature review of the use of machine learning techniques in the context of human-robot collaboration. 45 key papers were selected and analysed, and a clustering of works based on the type of collaborative tasks, evaluation metrics and cognitive variables modelled is proposed. Then, a deep analysis on different families of machine learning algorithms and their properties, along with the sensing modalities used, is carried out. Among the observations, it is outlined the importance of the machine learning algorithms to incorporate time dependencies. The salient features of these works are then cross-analysed to show trends in HRC and give guidelines for future works, comparing them with other aspects of HRC not appeared in the review.
    Semi-Targeted Model Poisoning Attack on Federated Learning via Backward Error Analysis. (arXiv:2203.11633v2 [cs.LG] UPDATED)
    Model poisoning attacks on federated learning (FL) intrude in the entire system via compromising an edge model, resulting in malfunctioning of machine learning models. Such compromised models are tampered with to perform adversary-desired behaviors. In particular, we considered a semi-targeted situation where the source class is predetermined however the target class is not. The goal is to cause the global classifier to misclassify data of the source class. Though approaches such as label flipping have been adopted to inject poisoned parameters into FL, it has been shown that their performances are usually class-sensitive varying with different target classes applied. Typically, an attack can become less effective when shifting to a different target class. To overcome this challenge, we propose the Attacking Distance-aware Attack (ADA) to enhance a poisoning attack by finding the optimized target class in the feature space. Moreover, we studied a more challenging situation where an adversary had limited prior knowledge about a client's data. To tackle this problem, ADA deduces pair-wise distances between different classes in the latent feature space from shared model parameters based on the backward error analysis. We performed extensive empirical evaluations on ADA by varying the factor of attacking frequency in three different image classification tasks. As a result, ADA succeeded in increasing the attack performance by 1.8 times in the most challenging case with an attacking frequency of 0.01.
    Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation. (arXiv:2204.01171v2 [cs.CL] UPDATED)
    Current language generation models suffer from issues such as repetition, incoherence, and hallucinations. An often-repeated hypothesis is that this brittleness of generation models is caused by the training and the generation procedure mismatch, also referred to as exposure bias. In this paper, we verify this hypothesis by analyzing exposure bias from an imitation learning perspective. We show that exposure bias leads to an accumulation of errors, analyze why perplexity fails to capture this accumulation, and empirically show that this accumulation results in poor generation quality. Source code to reproduce these experiments is available at https://github.com/kushalarora/quantifying_exposure_bias
    Verifying Integrity of Deep Ensemble Models by Lossless Black-box Watermarking with Sensitive Samples. (arXiv:2205.04145v2 [cs.CR] UPDATED)
    With the widespread use of deep neural networks (DNNs) in many areas, more and more studies focus on protecting DNN models from intellectual property (IP) infringement. Many existing methods apply digital watermarking to protect the DNN models. The majority of them either embed a watermark directly into the internal network structure/parameters or insert a zero-bit watermark by fine-tuning a model to be protected with a set of so-called trigger samples. Though these methods work very well, they were designed for individual DNN models, which cannot be directly applied to deep ensemble models (DEMs) that combine multiple DNN models to make the final decision. It motivates us to propose a novel black-box watermarking method in this paper for DEMs, which can be used for verifying the integrity of DEMs. In the proposed method, a certain number of sensitive samples are carefully selected through mimicking real-world DEM attacks and analyzing the prediction results of the sub-models of the non-attacked DEM and the attacked DEM on the carefully crafted dataset. By analyzing the prediction results of the target DEM on these carefully crafted sensitive samples, we are able to verify the integrity of the target DEM. Different from many previous methods, the proposed method does not modify the original DEM to be protected, which indicates that the proposed method is lossless. Experimental results have shown that the DEM integrity can be reliably verified even if only one sub-model was attacked, which has good potential in practice.
    Exploring Viable Algorithmic Options for Learning from Demonstration (LfD): A Parameterized Complexity Approach. (arXiv:2205.04989v1 [cs.LG])
    The key to reconciling the polynomial-time intractability of many machine learning tasks in the worst case with the surprising solvability of these tasks by heuristic algorithms in practice seems to be exploiting restrictions on real-world data sets. One approach to investigating such restrictions is to analyze why heuristics perform well under restrictions. A complementary approach would be to systematically determine under which sets of restrictions efficient and reliable machine learning algorithms do and do not exist. In this paper, we show how such a systematic exploration of algorithmic options can be done using parameterized complexity analysis, As an illustrative example, we give the first parameterized complexity analysis of batch and incremental policy inference under Learning from Demonstration (LfD). Relative to a basic model of LfD, we show that none of our problems can be solved efficiently either in general or relative to a number of (often simultaneous) restrictions on environments, demonstrations, and policies. We also give the first known restrictions under which efficient solvability is possible and discuss the implications of our solvability and unsolvability results for both our basic model of LfD and more complex models of LfD used in practice.
    Representing Hierarchical Structure by Using Cone Embedding. (arXiv:2102.08014v2 [cs.AI] UPDATED)
    Graph embedding is becoming an important method with applications in various areas, including social networks and knowledge graph completion. In particular, Poincar\'e embedding has been proposed to capture the hierarchical structure of graphs, and its effectiveness has been reported. However, most of the existing methods have isometric mappings in the embedding space, and the choice of the origin point can be arbitrary. This fact is not desirable when the distance from the origin is used as an indicator of hierarchy, as in the case of Poincar\'e embedding. In this paper, we propose cone embedding, embedding method in a metric cone, which solve these problems, and we gain further benefits: 1) we provide an indicator of hierarchical information that is both geometrically and intuitively natural to interpret, and 2) we can extract the hierarchical structure from a graph embedding output of other methods by learning additional one-dimensional parameters.
    Entity Linking and Discovery via Arborescence-based Supervised Clustering. (arXiv:2109.01242v2 [cs.CL] UPDATED)
    Previous work has shown promising results in performing entity linking by measuring not only the affinities between mentions and entities but also those amongst mentions. In this paper, we present novel training and inference procedures that fully utilize mention-to-mention affinities by building minimum arborescences (i.e., directed spanning trees) over mentions and entities across documents in order to make linking decisions. We also show that this method gracefully extends to entity discovery, enabling the clustering of mentions that do not have an associated entity in the knowledge base. We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset, and show significant improvements in performance for both entity linking and discovery compared to identically parameterized models. We further show significant efficiency improvements with only a small loss in accuracy over previous work, which use more computationally expensive models.
    Learning to Answer Visual Questions from Web Videos. (arXiv:2205.05019v1 [cs.CV])
    Recent methods for visual question answering rely on large-scale annotated datasets. Manual annotation of questions and answers for videos, however, is tedious, expensive and prevents scalability. In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision. We leverage a question generation transformer trained on text data and use it to generate question-answer pairs from transcribed video narrations. Given narrated videos, we then automatically generate the HowToVQA69M dataset with 69M video-question-answer triplets. To handle the open vocabulary of diverse answers in this dataset, we propose a training procedure based on a contrastive loss between a video-question multi-modal transformer and an answer transformer. We introduce the zero-shot VideoQA task and the VideoQA feature probe evaluation setting and show excellent results, in particular for rare answers. Furthermore, our method achieves competitive results on MSRVTT-QA, ActivityNet-QA, MSVD-QA and How2QA datasets. We also show that our VideoQA dataset generation approach generalizes to another source of web video and text data. We use our method to generate the \webdataname{} dataset from the WebVid dataset, i.e., videos with alt-text annotations, and show its benefits for training VideoQA models. Finally, for a detailed evaluation we introduce \smalldatasetname{}, a new VideoQA dataset with reduced language bias and high-quality manual annotations. Code, datasets and trained models are available at https://antoyang.github.io/just-ask.html
    Fundamental limitations on optimization in variational quantum algorithms. (arXiv:2205.05056v1 [quant-ph])
    Exploring quantum applications of near-term quantum devices is a rapidly growing field of quantum information science with both theoretical and practical interests. A leading paradigm to establish such near-term quantum applications is variational quantum algorithms (VQAs). These algorithms use a classical optimizer to train a parameterized quantum circuit to accomplish certain tasks, where the circuits are usually randomly initialized. In this work, we prove that for a broad class of such random circuits, the variation range of the cost function via adjusting any local quantum gate within the circuit vanishes exponentially in the number of qubits with a high probability. This result can unify the restrictions on gradient-based and gradient-free optimizations in a natural manner and reveal extra harsh constraints on the training landscapes of VQAs. Hence a fundamental limitation on the trainability of VQAs is unraveled, indicating the essence of the optimization hardness in the Hilbert space with exponential dimension. We further showcase the validity of our results with numerical simulations of representative VQAs. We believe that these results would deepen our understanding of the scalability of VQAs and shed light on the search for near-term quantum applications with advantages.
    Adaptive Ranking Based Constraint Handling for Explicitly Constrained Black-Box Optimization. (arXiv:1811.00764v3 [cs.NE] UPDATED)
    We propose a novel constraint-handling technique for the covariance matrix adaptation evolution strategy (CMA-ES). The proposed technique is aimed at solving explicitly constrained black-box continuous optimization problems, in which the explicit constraint is a constraint whereby the computational time for the constraint violation and its (numerical) gradient are negligible compared to that for the objective function. This method is designed to realize two invariance properties: invariance to the affine transformation of the search space, and invariance to the increasing transformation of the objective and constraint functions. The CMA-ES is designed to possess these properties for handling difficulties that appear in black-box optimization problems, such as non-separability, ill-conditioning, ruggedness, and the different orders of magnitude in the objective. The proposed constraint-handling technique (CHT), known as ARCH, modifies the underlying CMA-ES only in terms of the ranking of the candidate solutions. It employs a repair operator and an adaptive ranking aggregation strategy to compute the ranking. We developed test problems to evaluate the effects of the invariance properties, and performed experiments to empirically verify the invariance of the algorithm. We compared the proposed method with other CHTs on the CEC 2006 constrained optimization benchmark suite to demonstrate its efficacy. Empirical studies reveal that ARCH is able to exploit the explicitness of the constraint functions effectively, sometimes even more efficiently than an existing box-constraint handling technique on box-constrained problems, while exhibiting the invariance properties. Moreover, ARCH overwhelmingly outperforms CHTs by not exploiting the explicit constraints in terms of the number of objective function calls.
    Hybrid Far- and Near-Field Channel Estimation for THz Ultra-Massive MIMO via Fixed Point Networks. (arXiv:2205.04944v1 [eess.SP])
    Terahertz ultra-massive multiple-input multiple-output (THz UM-MIMO) is envisioned as one of the key enablers of 6G wireless systems. Due to the joint effect of its large array aperture and small wavelength, the near-field region of THz UM-MIMO systems is greatly enlarged. The high-dimensional channel of such systems thus consists of a stochastic mixture of far and near fields, which renders channel estimation extremely challenging. Previous works based on uni-field assumptions cannot capture the hybrid far- and near-field features, and will suffer significant performance loss. This motivates us to consider hybrid-field channel estimation. We draw inspirations from fixed point theory to develop an efficient deep learning based channel estimator with adaptive complexity and linear convergence guarantee. Built upon classic orthogonal approximate message passing, we transform each iteration into a contractive mapping, comprising a closed-form linear estimator and a neural network based non-linear estimator. A major algorithmic innovation involves applying fixed point iteration to compute the channel estimate while modeling neural networks with arbitrary depth and adapting to the hybrid-field channel conditions. Simulation results will verify our theoretical analysis and show significant performance gains over state-of-the-art approaches in the estimation accuracy and convergence rate.
    Impact of L1 Batch Normalization on Analog Noise Resistant Property of Deep Learning Models. (arXiv:2205.04886v1 [cs.LG])
    Analog hardware has become a popular choice for machine learning on resource-constrained devices recently due to its fast execution and energy efficiency. However, the inherent presence of noise in analog hardware and the negative impact of the noise on deployed deep neural network (DNN) models limit their usage. The degradation in performance due to the noise calls for the novel design of DNN models that have excellent noiseresistant property, leveraging the properties of the fundamental building block of DNN models. In this work, the use of L1 or TopK BatchNorm type, a fundamental DNN model building block, in designing DNN models with excellent noise-resistant property is proposed. Specifically, a systematic study has been carried out by training DNN models with L1/TopK BatchNorm type, and the performance is compared with DNN models with L2 BatchNorm types. The resulting model noise-resistant property is tested by injecting additive noise to the model weights and evaluating the new model inference accuracy due to the noise. The results show that L1 and TopK BatchNorm type has excellent noise-resistant property, and there is no sacrifice in performance due to the change in the BatchNorm type from L2 to L1/TopK BatchNorm type.
    On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning. (arXiv:2105.01648v4 [cs.LG] UPDATED)
    The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning. But how is the performance of winning lottery tickets affected by the distributional shift inherent to reinforcement learning problems? In this work, we address this question by comparing sparse agents who have to address the non-stationarity of the exploration-exploitation problem with supervised agents trained to imitate an expert. We show that feed-forward networks trained with behavioural cloning compared to reinforcement learning can be pruned to higher levels of sparsity without performance degradation. This suggests that in order to solve the RL-specific distributional shift agents require more degrees of freedom. Using a set of carefully designed baseline conditions, we find that the majority of the lottery ticket effect in both learning paradigms can be attributed to the identified mask rather than the weight initialization. The input layer mask selectively prunes entire input dimensions that turn out to be irrelevant for the task at hand. At a moderate level of sparsity the mask identified by iterative magnitude pruning yields minimal task-relevant representations, i.e., an interpretable inductive bias. Finally, we propose a simple initialization rescaling which promotes the robust identification of sparse task representations in low-dimensional control tasks.
    Text-Free Prosody-Aware Generative Spoken Language Modeling. (arXiv:2109.03264v2 [cs.CL] UPDATED)
    Speech pre-training has primarily demonstrated efficacy on classification tasks, while its capability of generating novel speech, similar to how GPT-2 can generate coherent paragraphs, has barely been explored. Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences. Unfortunately, despite eliminating the need of text, the units used in GSLM discard most of the prosodic information. Hence, GSLM fails to leverage prosody for better comprehension, and does not generate expressive speech. In this work, we present a prosody-aware generative spoken language model (pGSLM). It is composed of a multi-stream transformer language model (MS-TLM) of speech, represented as discovered unit and prosodic feature streams, and an adapted HiFi-GAN model converting MS-TLM outputs to waveforms. We devise a series of metrics for prosody modeling and generation, and re-use metrics from GSLM for content modeling. Experimental results show that the pGSLM can utilize prosody to improve both prosody and content modeling, and also generate natural, meaningful, and coherent speech given a spoken prompt. Audio samples can be found at https://speechbot.github.io/pgslm. Codes and models are available at https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/pgslm.
    Adaptive Graph Convolutional Network Framework for Multidimensional Time Series Prediction. (arXiv:2205.04885v1 [cs.LG])
    In the real world, long sequence time-series forecasting (LSTF) is needed in many cases, such as power consumption prediction and air quality prediction.Multi-dimensional long time series model has more strict requirements on the model, which not only needs to effectively capture the accurate long-term dependence between input and output, but also needs to capture the relationship between data of different dimensions.Recent research shows that the Informer model based on Transformer has achieved excellent performance in long time series prediction.However, this model still has some deficiencies in multidimensional prediction,it cannot capture the relationship between different dimensions well. We improved Informer to address its shortcomings in multidimensional forecasting. First,we introduce an adaptive graph neural network to capture hidden dimension dependencies in mostly time series prediction. Secondly,we integrate adaptive graph convolutional networks into various spatio-temporal series prediction models to solve the defect that they cannot capture the relationship between different dimensions. Thirdly,After experimental testing with multiple data sets, the accuracy of our framework improved by about 10\% after being introduced into the model.
    An overview of artificial intelligence techniques for diagnosis of Schizophrenia based on magnetic resonance imaging modalities: Methods, challenges, and future works. (arXiv:2103.03081v3 [cs.LG] UPDATED)
    Schizophrenia (SZ) is a mental disorder that typically emerges in late adolescence or early adulthood. It reduces the life expectancy of patients by 15 years. Abnormal behavior, perception of emotions, social relationships, and reality perception are among its most significant symptoms. Past studies have revealed that SZ affects the temporal and anterior lobes of hippocampus regions of the brain. Also, increased volume of cerebrospinal fluid (CSF) and decreased volume of white and gray matter can be observed due to this disease. Magnetic resonance imaging (MRI) is the popular neuroimaging technique used to explore structural/functional brain abnormalities in SZ disorder, owing to its high spatial resolution. Various artificial intelligence (AI) techniques have been employed with advanced image/signal processing methods to accurately diagnose SZ. This paper presents a comprehensive overview of studies conducted on the automated diagnosis of SZ using MRI modalities. First, an AI-based computer aided-diagnosis system (CADS) for SZ diagnosis and its relevant sections are presented. Then, this section introduces the most important conventional machine learning (ML) and deep learning (DL) techniques in the diagnosis of diagnosing SZ. A comprehensive comparison is also made between ML and DL studies in the discussion section. In the following, the most important challenges in diagnosing SZ are addressed. Future works in diagnosing SZ using AI techniques and MRI modalities are recommended in another section. Results, conclusion, and research findings are also presented at the end.
    On learning agent-based models from data. (arXiv:2205.05052v1 [physics.soc-ph])
    Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the latent micro-variables of an ABM from data. The first step of our protocol is to reduce an ABM to a probabilistic model, characterized by a computationally tractable likelihood. This reduction follows two general design principles: balance of stochasticity and data availability, and replacement of unobservable discrete choices with differentiable approximations. Then, our protocol proceeds by maximizing the likelihood of the latent variables via a gradient-based expectation maximization algorithm. We demonstrate our protocol by applying it to an ABM of the housing market, in which agents with different incomes bid higher prices to live in high-income neighborhoods. We demonstrate that the obtained model allows accurate estimates of the latent variables, while preserving the general behavior of the ABM. We also show that our estimates can be used for out-of-sample forecasting. Our protocol can be seen as an alternative to black-box data assimilation methods, that forces the modeler to lay bare the assumptions of the model, to think about the inferential process, and to spot potential identification problems.
    Secure Distributed/Federated Learning: Prediction-Privacy Trade-Off for Multi-Agent System. (arXiv:2205.04855v1 [cs.MA])
    Decentralized learning is an efficient emerging paradigm for boosting the computing capability of multiple bounded computing agents. In the big data era, performing inference within the distributed and federated learning (DL and FL) frameworks, the central server needs to process a large amount of data while relying on various agents to perform multiple distributed training tasks. Considering the decentralized computing topology, privacy has become a first-class concern. Moreover, assuming limited information processing capability for the agents calls for a sophisticated \textit{privacy-preserving decentralization} that ensures efficient computation. Towards this end, we study the \textit{privacy-aware server to multi-agent assignment} problem subject to information processing constraints associated with each agent, while maintaining the privacy and assuring learning informative messages received by agents about a global terminal through the distributed private federated learning (DPFL) approach. To find a decentralized scheme for a two-agent system, we formulate an optimization problem that balances privacy and accuracy, taking into account the quality of compression constraints associated with each agent. We propose an iterative converging algorithm by alternating over self-consistent equations. We also numerically evaluate the proposed solution to show the privacy-prediction trade-off and demonstrate the efficacy of the novel approach in ensuring privacy in DL and FL.
    On the Verge of Solving Rocket League using Deep Reinforcement Learning and Sim-to-sim Transfer. (arXiv:2205.05061v1 [cs.LG])
    Autonomously trained agents that are supposed to play video games reasonably well rely either on fast simulation speeds or heavy parallelization across thousands of machines running concurrently. This work explores a third way that is established in robotics, namely sim-to-real transfer, or if the game is considered a simulation itself, sim-to-sim transfer. In the case of Rocket League, we demonstrate that single behaviors of goalies and strikers can be successfully learned using Deep Reinforcement Learning in the simulation environment and transferred back to the original game. Although the implemented training simulation is to some extent inaccurate, the goalkeeping agent saves nearly 100% of its faced shots once transferred, while the striking agent scores in about 75% of cases. Therefore, the trained agent is robust enough and able to generalize to the target domain of Rocket League.
    Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path. (arXiv:2106.02073v4 [cs.LG] UPDATED)
    The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC.
    Concave Utility Reinforcement Learning with Zero-Constraint Violations. (arXiv:2109.05439v2 [cs.LG] UPDATED)
    We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. Various learning applications with constraints, such as robotics, do not allow for policies that can violate constraints. To this end, we propose a model-based learning algorithm that achieves zero constraint violations. To obtain this result, we assume that the concave objective and the convex constraints have a solution interior to the set of feasible occupation measures. We then solve a tighter optimization problem to ensure that the constraints are never violated despite the imprecise model knowledge and model stochasticity. We also propose a novel Bellman error based analysis for tabular infinite-horizon setups which allows to analyse stochastic policies. Combining the Bellman error based analysis and tighter optimization equation, for $T$ interactions with the environment, we obtain a regret guarantee for objective which grows as $\Tilde{O}(1/\sqrt{T})$, excluding other factors.
    Search-Based Testing of Reinforcement Learning. (arXiv:2205.04887v1 [cs.LG])
    Evaluation of deep reinforcement learning (RL) is inherently challenging. Especially the opaqueness of learned policies and the stochastic nature of both agents and environments make testing the behavior of deep RL agents difficult. We present a search-based testing framework that enables a wide range of novel analysis capabilities for evaluating the safety and performance of deep RL agents. For safety testing, our framework utilizes a search algorithm that searches for a reference trace that solves the RL task. The backtracking states of the search, called boundary states, pose safety-critical situations. We create safety test-suites that evaluate how well the RL agent escapes safety-critical situations near these boundary states. For robust performance testing, we create a diverse set of traces via fuzz testing. These fuzz traces are used to bring the agent into a wide variety of potentially unknown states from which the average performance of the agent is compared to the average performance of the fuzz traces. We apply our search-based testing approach on RL for Nintendo's Super Mario Bros.
    Domain Generalization: A Survey. (arXiv:2103.02503v5 [cs.LG] UPDATED)
    Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Over the last ten years, research in DG has made great progress, leading to a broad spectrum of methodologies, e.g., those based on domain alignment, meta-learning, data augmentation, or ensemble learning, to name a few; DG has also been studied in various application areas including computer vision, speech recognition, natural language processing, medical imaging, and reinforcement learning. In this paper, for the first time a comprehensive literature review in DG is provided to summarize the developments over the past decade. Specifically, we first cover the background by formally defining DG and relating it to other relevant fields like domain adaptation and transfer learning. Then, we conduct a thorough review into existing methods and theories. Finally, we conclude this survey with insights and discussions on future research directions.
    Online AutoML: An adaptive AutoML framework for online learning. (arXiv:2201.09750v2 [cs.LG] UPDATED)
    Automated Machine Learning (AutoML) has been used successfully in settings where the learning task is assumed to be static. In many real-world scenarios, however, the data distribution will evolve over time, and it is yet to be shown whether AutoML techniques can effectively design online pipelines in dynamic environments. This study aims to automate pipeline design for online learning while continuously adapting to data drift. For this purpose, we design an adaptive Online Automated Machine Learning (OAML) system, searching the complete pipeline configuration space of online learners, including preprocessing algorithms and ensembling techniques. This system combines the inherent adaptation capabilities of online learners with the fast automated pipeline (re)optimization capabilities of AutoML. Focusing on optimization techniques that can adapt to evolving objectives, we evaluate asynchronous genetic programming and asynchronous successive halving to optimize these pipelines continually. We experiment on real and artificial data streams with varying types of concept drift to test the performance and adaptation capabilities of the proposed system. The results confirm the utility of OAML over popular online learning algorithms and underscore the benefits of continuous pipeline redesign in the presence of data drift.
    Cluster-based Input Weight Initialization for Echo State Networks. (arXiv:2103.04710v3 [cs.LG] CROSS LISTED)
    Echo State Networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of this work is to propose an unsupervised initialization of the input connections using the $K$-Means algorithm on the training data. We show that for a large variety of datasets this initialization performs equivalently or superior than a randomly initialized ESN whilst needing significantly less reservoir neurons. Furthermore, we discuss that this approach provides the opportunity to estimate a suitable size of the reservoir based on prior knowledge about the data.
    Turtle Score -- Similarity Based Developer Analyzer. (arXiv:2205.04876v1 [stat.ML])
    In day-to-day life, a highly demanding task for IT companies is to find the right candidates who fit the companies' culture. This research aims to comprehend, analyze and automatically produce convincing outcomes to find a candidate who perfectly fits right in the company. Data is examined and collected for each employee who works in the IT domain focusing on their performance measure. This is done based on various different categories which bring versatility and a wide view of focus. To this data, learner analysis is done using machine learning algorithms to obtain learner similarity and developer similarity in order to recruit people with identical working patterns. It's been proven that the efficiency and capability of a particular worker go higher when working with a person of a similar personality. Therefore this will serve as a useful tool for recruiters who aim to recruit people with high productivity. This is to say that the model designed will render the best outcome possible with high accuracy and an immaculate recommendation score.
    A Wasserstein distance approach for concentration of empirical risk estimates. (arXiv:1902.10709v4 [math.ST] UPDATED)
    This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as special cases well known risk measures from the finance literature such as conditional value at risk (CVaR), optimized certainty equivalent risk, spectral risk measures, utility-based shortfall risk, cumulative prospect theory (CPT) value, rank dependent expected utility and distorted risk measures. Two estimation schemes are considered, one for each class of risk measures. One estimation scheme involves applying the risk measure to the empirical distribution function formed from a collection of i.i.d. samples of the random variable (r.v.), while the second scheme involves applying the same procedure to a truncated sample. The bounds provided apply to three popular classes of distributions, namely sub-Gaussian, sub-exponential and heavy-tailed distributions. The bounds are derived by first relating the estimation error to the Wasserstein distance between the true and empirical distributions, and then using recent concentration bounds for the latter. Previous concentration bounds are available only for specific risk measures such as CVaR and CPT-value. The bounds derived in this paper are shown to either match or improve upon previous bounds in cases where they are available. The usefulness of the bounds is illustrated through an algorithm and the corresponding regret bound for a stochastic bandit problem involving a general risk measure from each of the two classes introduced in the paper.
    Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making. (arXiv:2205.04790v1 [stat.ML])
    Decision making algorithms, in practice, are often trained on data that exhibits a variety of biases. Decision-makers often aim to take decisions based on some ground-truth target that is assumed or expected to be unbiased, i.e., equally distributed across socially salient groups. In many practical settings, the ground-truth cannot be directly observed, and instead, we have to rely on a biased proxy measure of the ground-truth, i.e., biased labels, in the data. In addition, data is often selectively labeled, i.e., even the biased labels are only observed for a small fraction of the data that received a positive decision. To overcome label and selection biases, recent work proposes to learn stochastic, exploring decision policies via i) online training of new policies at each time-step and ii) enforcing fairness as a constraint on performance. However, the existing approach uses only labeled data, disregarding a large amount of unlabeled data, and thereby suffers from high instability and variance in the learned decision policies at different times. In this paper, we propose a novel method based on a variational autoencoder for practical fair decision-making. Our method learns an unbiased data representation leveraging both labeled and unlabeled data and uses the representations to learn a policy in an online process. Using synthetic data, we empirically validate that our method converges to the optimal (fair) policy according to the ground-truth with low variance. In real-world experiments, we further show that our training approach not only offers a more stable learning process but also yields policies with higher fairness as well as utility than previous approaches.
    Disentangling A Single MR Modality. (arXiv:2205.04982v1 [eess.IV])
    Disentangling anatomical and contrast information from medical images has gained attention recently, demonstrating benefits for various image analysis tasks. Current methods learn disentangled representations using either paired multi-modal images with the same underlying anatomy or auxiliary labels (e.g., manual delineations) to provide inductive bias for disentanglement. However, these requirements could significantly increase the time and cost in data collection and limit the applicability of these methods when such data are not available. Moreover, these methods generally do not guarantee disentanglement. In this paper, we present a novel framework that learns theoretically and practically superior disentanglement from single modality magnetic resonance images. Moreover, we propose a new information-based metric to quantitatively evaluate disentanglement. Comparisons over existing disentangling methods demonstrate that the proposed method achieves superior performance in both disentanglement and cross-domain image-to-image translation tasks.
    On Recurrent Neural Networks for learning-based control: recent results and ideas for future developments. (arXiv:2111.13557v2 [eess.SY] UPDATED)
    This paper aims to discuss and analyze the potentialities of Recurrent Neural Networks (RNN) in control design applications. The main families of RNN are considered, namely Neural Nonlinear AutoRegressive eXogenous, (NNARX), Echo State Networks (ESN), Long Short Term Memory (LSTM), and Gated Recurrent Units (GRU). The goal is twofold. Firstly, to survey recent results concerning the training of RNN that enjoy Input-to-State Stability (ISS) and Incremental Input-to-State Stability ($\delta$ISS) guarantees. Secondly, to discuss the issues that still hinder the widespread use of RNN for control, namely their robustness, verifiability, and interpretability. The former properties are related to the so-called generalization capabilities of the networks, i.e. their consistency with the underlying real plants, even in presence of unseen or perturbed input trajectories. The latter is instead related to the possibility of providing a clear formal connection between the RNN model and the plant. In this context, we illustrate how ISS and $\delta$ISS represent a significant step towards the robustness and verifiability of the RNN models, while the requirement of interpretability paves the way to the use of physics-based networks. The design of model predictive controllers with RNN as plant's model is also briefly discussed. Lastly, some of the main topics of the paper are illustrated on a simulated chemical system.
    Spike-based computational models of bio-inspired memories in the hippocampal CA3 region on SpiNNaker. (arXiv:2205.04782v1 [cs.NE])
    The human brain is the most powerful and efficient machine in existence today, surpassing in many ways the capabilities of modern computers. Currently, lines of research in neuromorphic engineering are trying to develop hardware that mimics the functioning of the brain to acquire these superior capabilities. One of the areas still under development is the design of bio-inspired memories, where the hippocampus plays an important role. This region of the brain acts as a short-term memory with the ability to store associations of information from different sensory streams in the brain and recall them later. This is possible thanks to the recurrent collateral network architecture that constitutes CA3, the main sub-region of the hippocampus. In this work, we developed two spike-based computational models of fully functional hippocampal bio-inspired memories for the storage and recall of complex patterns implemented with spiking neural networks on the SpiNNaker hardware platform. These models present different levels of biological abstraction, with the first model having a constant oscillatory activity closer to the biological model, and the second one having an energy-efficient regulated activity, which, although it is still bio-inspired, opts for a more functional approach. Different experiments were performed for each of the models, in order to test their learning/recalling capabilities. A comprehensive comparison between the functionality and the biological plausibility of the presented models was carried out, showing their strengths and weaknesses. The two models, which are publicly available for researchers, could pave the way for future spike-based implementations and applications.
    Gradient flows on graphons: existence, convergence, continuity equations. (arXiv:2111.09459v2 [math.PR] UPDATED)
    Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.
    Classification and mapping of low-statured 'shrubland' cover types in post-agricultural landscapes of the US Northeast. (arXiv:2205.05047v1 [cs.CV])
    Context: Novel plant communities reshape landscapes and pose challenges for land cover classification and mapping that can constrain research and stewardship efforts. In the US Northeast, emergence of low-statured woody vegetation, or 'shrublands', instead of secondary forests in post-agricultural landscapes is well-documented by field studies, but poorly understood from a landscape perspective, which limits the ability to systematically study and manage these lands. Objectives: To address gaps in classification/mapping of low-statured cover types where they have been historically rare, we developed models to predict 'shrubland' distributions at 30m resolution across New York State (NYS), using machine learning and model ensembling techniques to integrate remote sensing of structural (airborne LIDAR) and optical (satellite imagery) properties of vegetation cover. We first classified a 1m canopy height model (CHM), derived from a "patchwork" of available LIDAR coverages, to define shrubland presence/absence. Next, these non-contiguous maps were used to train a model ensemble based on temporally-segmented imagery to predict 'shrubland' probability for the entire study landscape (NYS). Results: Approximately 2.5% of the CHM coverage area was classified as shrubland. Models using Landsat predictors trained on the classified CHM were effective at identifying shrubland (test set AUC=0.893, real-world AUC=0.904), in discriminating between shrub/young forest and other cover classes, and produced qualitatively sensible maps, even when extending beyond the original training data. Conclusions: After ground-truthing, we expect these shrubland maps and models will have many research and stewardship applications including wildlife conservation, invasive species mitigation and natural climate solutions.
    Long-term stability and generalization of observationally-constrained stochastic data-driven models for geophysical turbulence. (arXiv:2205.04601v1 [cs.LG])
    Recent years have seen a surge in interest in building deep learning-based fully data-driven models for weather prediction. Such deep learning models if trained on observations can mitigate certain biases in current state-of-the-art weather models, some of which stem from inaccurate representation of subgrid-scale processes. However, these data-driven models, being over-parameterized, require a lot of training data which may not be available from reanalysis (observational data) products. Moreover, an accurate, noise-free, initial condition to start forecasting with a data-driven weather model is not available in realistic scenarios. Finally, deterministic data-driven forecasting models suffer from issues with long-term stability and unphysical climate drift, which makes these data-driven models unsuitable for computing climate statistics. Given these challenges, previous studies have tried to pre-train deep learning-based weather forecasting models on a large amount of imperfect long-term climate model simulations and then re-train them on available observational data. In this paper, we propose a convolutional variational autoencoder-based stochastic data-driven model that is pre-trained on an imperfect climate model simulation from a 2-layer quasi-geostrophic flow and re-trained, using transfer learning, on a small number of noisy observations from a perfect simulation. This re-trained model then performs stochastic forecasting with a noisy initial condition sampled from the perfect simulation. We show that our ensemble-based stochastic data-driven model outperforms a baseline deterministic encoder-decoder-based convolutional model in terms of short-term skills while remaining stable for long-term climate simulations yielding accurate climatology.
    Hyperparameter optimization of hybrid quantum neural networks for car classification. (arXiv:2205.04878v1 [quant-ph])
    Image recognition is one of the primary applications of machine learning algorithms. Nevertheless, machine learning models used in modern image recognition systems consist of millions of parameters that usually require significant computational time to be adjusted. Moreover, adjustment of model hyperparameters leads to additional overhead. Because of this, new developments in machine learning models and hyperparameter optimization techniques are required. This paper presents a quantum-inspired hyperparameter optimization technique and a hybrid quantum-classical machine learning model for supervised learning. We benchmark our hyperparameter optimization method over standard black-box objective functions and observe performance improvements in the form of reduced expected run times and fitness in response to the growth in the size of the search space. We test our approaches in a car image classification task, and demonstrate a full-scale implementation of the hybrid quantum neural network model with the tensor train hyperparameter optimization. Our tests show a qualitative and quantitative advantage over the corresponding standard classical tabular grid search approach used with a deep neural network ResNet34. A classification accuracy of 0.97 was obtained by the hybrid model after 18 iterations, whereas the classical model achieved an accuracy of 0.92 after 75 iterations.
    PyRCN: A Toolbox for Exploration and Application of Reservoir Computing Networks. (arXiv:2103.04807v3 [cs.LG] UPDATED)
    Reservoir Computing Networks (RCNs) belong to a group of machine learning techniques that project the input space non-linearly into a high-dimensional feature space, where the underlying task can be solved linearly. Popular variants of RCNs are capable of solving complex tasks equivalently to widely used deep neural networks, but with a substantially simpler training paradigm based on linear regression. In this paper, we show how to uniformly describe RCNs with small and clearly defined building blocks, and we introduce the Python toolbox PyRCN (Python Reservoir Computing Networks) for optimizing, training and analyzing RCNs on arbitrarily large datasets. The tool is based on widely-used scientific packages and complies with the scikit-learn interface specification. It provides a platform for educational and exploratory analyses of RCNs, as well as a framework to apply RCNs on complex tasks including sequence processing. With a small number of building blocks, the framework allows the implementation of numerous different RCN architectures. We provide code examples on how to set up RCNs for time series prediction and for sequence classification tasks. PyRCN is around ten times faster than reference toolboxes on a benchmark task while requiring substantially less boilerplate code.
    Optimizing over an ensemble of neural networks. (arXiv:2112.07007v2 [cs.LG] UPDATED)
    We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit (ReLU) activation. Recent literature has explored the use of a single neural network to model either uncertain or complex elements within an objective function. However, it is well known that ensembles of neural networks produce more stable predictions and have better generalizability than models with single neural networks, which motivates the investigation of ensembles of neural networks rather than single neural networks in decision-making pipelines. We study how to incorporate a neural network ensemble as the objective function of an optimization model and explore computational approaches for the ensuing problem. We present a mixed-integer linear program based on existing popular big-M formulations for optimizing over a single neural network. We develop a two-phase approach for our model that combines preprocessing procedures to tighten bounds for critical neurons in the neural networks with a Lagrangian relaxation-based branch-and-bound approach. Experimental evaluations of our solution methods suggest that using ensembles of neural networks yields more stable and higher quality solutions, compared to single neural networks, and that our optimization algorithm outperforms (the adaption of) a state-of-the-art approach in terms of computational time and optimality gaps.
    DeepAuditor: Distributed Online Intrusion Detection System for IoT devices via Power Side-channel Auditing. (arXiv:2106.12753v3 [cs.CR] UPDATED)
    As the number of IoT devices has increased rapidly, IoT botnets have exploited the vulnerabilities of IoT devices. However, it is still challenging to detect the initial intrusion on IoT devices prior to massive attacks. Recent studies have utilized power side-channel information to identify this intrusion behavior on IoT devices but still lack accurate models in real-time for ubiquitous botnet detection. We proposed the first online intrusion detection system called DeepAuditor for IoT devices via power auditing. To develop the real-time system, we proposed a lightweight power auditing device called Power Auditor. We also designed a distributed CNN classifier for online inference in a laboratory setting. In order to protect data leakage and reduce networking redundancy, we then proposed a privacy-preserved inference protocol via Packed Homomorphic Encryption and a sliding window protocol in our system. The classification accuracy and processing time were measured, and the proposed classifier outperformed a baseline classifier, especially against unseen patterns. We also demonstrated that the distributed CNN design is secure against any distributed components. Overall, the measurements were shown to the feasibility of our real-time distributed system for intrusion detection on IoT devices.
    Adaptation Strategies for Automated Machine Learning on Evolving Data. (arXiv:2006.06480v3 [cs.LG] UPDATED)
    Automated Machine Learning (AutoML) systems have been shown to efficiently build good models for new datasets. However, it is often not clear how well they can adapt when the data evolves over time. The main goal of this study is to understand the effect of data stream challenges such as concept drift on the performance of AutoML methods, and which adaptation strategies can be employed to make them more robust. To that end, we propose 6 concept drift adaptation strategies and evaluate their effectiveness on different AutoML approaches. We do this for a variety of AutoML approaches for building machine learning pipelines, including those that leverage Bayesian optimization, genetic programming, and random search with automated stacking. These are evaluated empirically on real-world and synthetic data streams with different types of concept drift. Based on this analysis, we propose ways to develop more sophisticated and robust AutoML techniques.
    GRU-TV: Time- and velocity-aware GRU for patient representation on multivariate clinical time-series data. (arXiv:2205.04892v1 [cs.LG])
    Electronic health records (EHRs) provide a rich repository to track a patient's health status. EHRs seek to fully document the patient's physiological status, and include data that is is high dimensional, heterogeneous, and multimodal. The significant differences in the sampling frequency of clinical variables can result in high missing rates and uneven time intervals between adjacent records in the multivariate clinical time-series data extracted from EHRs. Current studies using clinical time-series data for patient characterization view the patient's physiological status as a discrete process described by sporadically collected values, while the dynamics in patient's physiological status are time-continuous. In addition, recurrent neural networks (RNNs) models widely used for patient representation learning lack the perception of time intervals and velocity, which limits the ability of the model to represent the physiological status of the patient. In this paper, we propose an improved gated recurrent unit (GRU), namely time- and velocity-aware GRU (GRU-TV), for patient representation learning of clinical multivariate time-series data in a time-continuous manner. In proposed GRU-TV, the neural ordinary differential equations (ODEs) and velocity perception mechanism are used to perceive the time interval between records in the time-series data and changing rate of the patient's physiological status, respectively. Experimental results on two real-world clinical EHR datasets(PhysioNet2012, MIMIC-III) show that GRU-TV achieve state-of-the-art performance in computer aided diagnosis (CAD) tasks, and is more advantageous in processing sampled data.
    Learning Combinatorial Node Labeling Algorithms. (arXiv:2106.03594v3 [cs.LG] UPDATED)
    We present a novel neural architecture to solve graph optimization problems where the solution consists of arbitrary node labels, allowing us to solve hard problems like graph coloring. We train our model using reinforcement learning, specifically policy gradients, which gives us both a greedy and a probabilistic policy. Our architecture builds on a graph attention network and uses several inductive biases to improve solution quality. Our learned deterministic heuristics for graph coloring give better solutions than classical degree-based greedy heuristics and only take seconds to apply to graphs with tens of thousands of vertices. Moreover, our probabilistic policies outperform all greedy state-of-the-art coloring baselines and a machine learning baseline. Finally, we show that our approach also generalizes to other problems by evaluating it on minimum vertex cover and outperforming two greedy heuristics.
    Accelerated functional brain aging in major depressive disorder: evidence from a large scale fMRI analysis of Chinese participants. (arXiv:2205.04871v1 [q-bio.NC])
    Major depressive disorder (MDD) is one of the most common mental health conditions that has been intensively investigated for its association with brain atrophy and mortality. Recent studies reveal that the deviation between the predicted and the chronological age can be a marker of accelerated brain aging to characterize MDD. However, current conclusions are usually drawn based on structural MRI information collected from Caucasian participants. The universality of this biomarker needs to be further validated by subjects with different ethnic/racial backgrounds and by different types of data. Here we make use of the REST-meta-MDD, a large scale resting-state fMRI dataset collected from multiple cohort participants in China. We develop a stacking machine learning model based on 1101 healthy controls, which estimates a subject's chronological age from fMRI with promising accuracy. The trained model is then applied to 1276 MDD patients from 24 sites. We observe that MDD patients exhibit a $+4.43$ years ($\text{$p$} < 0.0001$, $\text{Cohen's $d$} = 0.35$, $\text{95\% CI}:1.86 - 3.91$) higher brain-predicted age difference (brain-PAD) compared to controls. In the MDD subgroup, we observe a statistically significant $+2.09$ years ($\text{$p$} < 0.05$, $\text{Cohen's $d$} = 0.134483$) brain-PAD in antidepressant users compared to medication-free patients. The statistical relationship observed is further checked by three different machine learning algorithms. The positive brain-PAD observed in participants in China confirms the presence of accelerated brain aging in MDD patients. The utilization of functional brain connectivity for age estimation verifies existing findings from a new dimension.
    Pediatric Automatic Sleep Staging: A comparative study of state-of-the-art deep learning methods. (arXiv:2108.10211v3 [eess.SP] UPDATED)
    Background: Despite the tremendous progress recently made towards automatic sleep staging in adults, it is currently unknown if the most advanced algorithms generalize to the pediatric population, which displays distinctive characteristics in overnight polysomnography (PSG). Methods: To answer the question, in this work, we conduct a large-scale comparative study on the state-of-the-art deep learning methods for pediatric automatic sleep staging. Six different deep neural networks with diverging features are adopted to evaluate a sample of more than 1,200 children across a wide spectrum of obstructive sleep apnea (OSA) severity. Results: Our experimental results show that the individual performance of automated pediatric sleep stagers when evaluated on new subjects is equivalent to the expert-level one reported on adults. Combining the six stagers into ensemble models further boosts the staging accuracy, reaching an overall accuracy of 88.8%, a Cohen's kappa of 0.852, and a macro F1-score of 85.8%. At the same time, the ensemble models lead to reduced predictive uncertainty. The results also show that the studied algorithms and their ensembles are robust to concept drift when the training and test data were recorded seven months apart and after clinical intervention. Conclusion: However, we show that the improvements in the staging performance are not necessarily clinically significant although the ensemble models lead to more favorable clinical measures than the six standalone models. Significance: Detailed analyses further demonstrate "almost perfect" agreement between the automatic stagers to one another and their similar patterns on the staging errors, suggesting little room for improvement.
    Differentially Private Learning with Adaptive Clipping. (arXiv:1905.03871v5 [cs.LG] UPDATED)
    Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.
    Adjusted Expected Improvement for Cumulative Regret Minimization in Noisy Bayesian Optimization. (arXiv:2205.04901v1 [cs.LG])
    The expected improvement (EI) is one of the most popular acquisition functions for Bayesian optimization (BO) and has demonstrated good empirical performances in many applications for the minimization of simple regret. However, under the evaluation metric of cumulative regret, the performance of EI may not be competitive, and its existing theoretical regret upper bound still has room for improvement. To adapt the EI for better performance under cumulative regret, we introduce a novel quantity called the evaluation cost which is compared against the acquisition function, and with this, develop the expected improvement-cost (EIC) algorithm. In each iteration of EIC, a new point with the largest acquisition function value is sampled, only if that value exceeds its evaluation cost. If none meets this criteria, the current best point is resampled.This evaluation cost quantifies the potential downside of sampling a point, which is important under the cumulative regret metric as the objective function value in every iteration affects the performance measure. We further establish in theory a near-optimal regret upper bound of EIC for the squared-exponential covariance kernel under mild regularity conditions, and perform experiments to illustrate the improvement of EIC over several popular BO algorithms.
    White-box Testing of NLP models with Mask Neuron Coverage. (arXiv:2205.05050v1 [cs.CL])
    Recent literature has seen growing interest in using black-box strategies like CheckList for testing the behavior of NLP models. Research on white-box testing has developed a number of methods for evaluating how thoroughly the internal behavior of deep models is tested, but they are not applicable to NLP models. We propose a set of white-box testing methods that are customized for transformer-based NLP models. These include Mask Neuron Coverage (MNCOVER) that measures how thoroughly the attention layers in models are exercised during testing. We show that MNCOVER can refine testing suites generated by CheckList by substantially reduce them in size, for more than 60\% on average, while retaining failing tests -- thereby concentrating the fault detection power of the test suite. Further we show how MNCOVER can be used to guide CheckList input generation, evaluate alternative NLP testing methods, and drive data augmentation to improve accuracy.
    Deep learning based Chinese text sentiment mining and stock market correlation research. (arXiv:2205.04743v1 [q-fin.CP])
    We explore how to crawl financial forum data such as stock bars and combine them with deep learning models for sentiment analysis. In this paper, we will use the BERT model to train against the financial corpus and predict the SZSE Component Index, and find that applying the BERT model to the financial corpus through the maximum information coefficient comparison study. The obtained sentiment features will be able to reflect the fluctuations in the stock market and help to improve the prediction accuracy effectively. Meanwhile, this paper combines deep learning with financial text, in further exploring the mechanism of investor sentiment on stock market through deep learning method, which will be beneficial for national regulators and policy departments to develop more reasonable policy guidelines for maintaining the stability of stock market.
    Control Prefixes for Parameter-Efficient Text Generation. (arXiv:2110.08329v2 [cs.CL] UPDATED)
    Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to a downstream application. However, it uses the same dataset-level tuned prompt for all examples in the dataset. We extend this idea and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information, combining the benefits of prompt tuning and controlled generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). Although the aim is to develop a parameter-efficient model, we show Control Prefixes can even outperform full fine-tuning methods. We present state-of-the-art results on several data-to-text datasets, including WebNLG.  ( 2 min )
    Measure and Improve Robustness in NLP Models: A Survey. (arXiv:2112.08313v2 [cs.CL] UPDATED)
    As NLP models achieved state-of-the-art performances over benchmarks and gained wide applications, it has been increasingly important to ensure the safe deployment of these models in the real world, e.g., making sure the models are robust against unseen or challenging scenarios. Despite robustness being an increasingly studied topic, it has been separately explored in applications like vision and NLP, with various definitions, evaluation and mitigation strategies in multiple lines of research. In this paper, we aim to provide a unifying survey of how to define, measure and improve robustness in NLP. We first connect multiple definitions of robustness, then unify various lines of work on identifying robustness failures and evaluating models' robustness. Correspondingly, we present mitigation strategies that are data-driven, model-driven, and inductive-prior-based, with a more systematic view of how to effectively improve robustness in NLP models. Finally, we conclude by outlining open challenges and future directions to motivate further research in this area.  ( 2 min )
    Large Neighborhood Search based on Neural Construction Heuristics. (arXiv:2205.00772v2 [cs.LG] UPDATED)
    We propose a Large Neighborhood Search (LNS) approach utilizing a learned construction heuristic based on neural networks as repair operator to solve the vehicle routing problem with time windows (VRPTW). Our method uses graph neural networks to encode the problem and auto-regressively decodes a solution and is trained with reinforcement learning on the construction task without requiring any labels for supervision. The neural repair operator is combined with a local search routine, heuristic destruction operators and a selection procedure applied to a small population to arrive at a sophisticated solution approach. The key idea is to use the learned model to re-construct the partially destructed solution and to introduce randomness via the destruction heuristics (or the stochastic policy itself) to effectively explore a large neighborhood.  ( 2 min )
    Explainable Deep Learning Methods in Medical Diagnosis: A Survey. (arXiv:2205.04766v1 [eess.IV])
    The remarkable success of deep learning has prompted interest in its application to medical diagnosis. Even tough state-of-the-art deep learning models have achieved human-level accuracy on the classification of different types of medical data, these models are hardly adopted in clinical workflows, mainly due to their lack of interpretability. The black-box-ness of deep learning models has raised the need for devising strategies to explain the decision process of these models, leading to the creation of the topic of eXplainable Artificial Intelligence (XAI). In this context, we provide a thorough survey of XAI applied to medical diagnosis, including visual, textual, and example-based explanation methods. Moreover, this work reviews the existing medical imaging datasets and the existing metrics for evaluating the quality of the explanations . Complementary to most existing surveys, we include a performance comparison among a set of report generation-based methods. Finally, the major challenges in applying XAI to medical imaging are also discussed.  ( 2 min )
    Are Quantum Computers Practical Yet? A Case for Feature Selection in Recommender Systems using Tensor Networks. (arXiv:2205.04490v1 [cs.IR])
    Collaborative filtering models generally perform better than content-based filtering models and do not require careful feature engineering. However, in the cold-start scenario collaborative information may be scarce or even unavailable, whereas the content information may be abundant, but also noisy and expensive to acquire. Thus, selection of particular features that improve cold-start recommendations becomes an important and non-trivial task. In the recent approach by Nembrini et al., the feature selection is driven by the correlational compatibility between collaborative and content-based models. The problem is formulated as a Quadratic Unconstrained Binary Optimization (QUBO) which, due to its NP-hard complexity, is solved using Quantum Annealing on a quantum computer provided by D-Wave. Inspired by the reported results, we contend the idea that current quantum annealers are superior for this problem and instead focus on classical algorithms. In particular, we tackle QUBO via TTOpt, a recently proposed black-box optimizer based on tensor networks and multilinear algebra. We show the computational feasibility of this method for large problems with thousands of features, and empirically demonstrate that the solutions found are comparable to the ones obtained with D-Wave across all examined datasets.  ( 2 min )
    Unsupervised Belief Representation Learning with Information-Theoretic Variational Graph Auto-Encoders. (arXiv:2110.00210v5 [cs.SI] UPDATED)
    This paper develops a novel unsupervised algorithm for belief representation learning in polarized networks that (i) uncovers the latent dimensions of the underlying belief space and (ii) jointly embeds users and content items (that they interact with) into that space in a manner that facilitates a number of downstream tasks, such as stance detection, stance prediction, and ideology mapping. Inspired by total correlation in information theory, we propose the Information-Theoretic Variational Graph Auto-Encoder (InfoVGAE) that learns to project both users and content items (e.g., posts that represent user views) into an appropriate disentangled latent space. To better disentangle latent variables in that space, we develop a total correlation regularization module, a Proportional-Integral (PI) control module, and adopt rectified Gaussian distribution to ensure the orthogonality. The latent representation of users and content can then be used to quantify their ideological leaning and detect/predict their stances on issues. We evaluate the performance of the proposed InfoVGAE on three real-world datasets, of which two are collected from Twitter and one from U.S. Congress voting records. The evaluation results show that our model outperforms state-of-the-art unsupervised models by reducing 10.5% user clustering errors and achieving 12.1% higher F1 scores for stance separation of content items. In addition, InfoVGAE produces a comparable result with supervised models. We also discuss its performance on stance prediction and user ranking within ideological groups.  ( 3 min )
    Theory of Quantum Generative Learning Models with Maximum Mean Discrepancy. (arXiv:2205.04730v1 [quant-ph])
    The intrinsic probabilistic nature of quantum mechanics invokes endeavors of designing quantum generative learning models (QGLMs) with computational advantages over classical ones. To date, two prototypical QGLMs are quantum circuit Born machines (QCBMs) and quantum generative adversarial networks (QGANs), which approximate the target distribution in explicit and implicit ways, respectively. Despite the empirical achievements, the fundamental theory of these models remains largely obscure. To narrow this knowledge gap, here we explore the learnability of QCBMs and QGANs from the perspective of generalization when their loss is specified to be the maximum mean discrepancy. Particularly, we first analyze the generalization ability of QCBMs and identify their superiorities when the quantum devices can directly access the target distribution and the quantum kernels are employed. Next, we prove how the generalization error bound of QGANs depends on the employed Ansatz, the number of qudits, and input states. This bound can be further employed to seek potential quantum advantages in Hamiltonian learning tasks. Numerical results of QGLMs in approximating quantum states, Gaussian distribution, and ground states of parameterized Hamiltonians accord with the theoretical analysis. Our work opens the avenue for quantitatively understanding the power of quantum generative learning models.  ( 2 min )
    Matrix and graph representations of vine copula structures. (arXiv:2205.04783v1 [stat.ML])
    Vine copulas can efficiently model a large portion of probability distributions. This paper focuses on a more thorough understanding of their structures. We are building on well-known existing constructions to represent vine copulas with graphs as well as matrices. The graph representations include the regular, cherry and chordal graph sequence structures, which we show equivalence between. Importantly we also show that when a perfect elimination ordering of a vine structure is given, then it can always be uniquely represented with a matrix. O. M. N\'apoles has shown a way to represent them in a matrix, and we algorithmify this previous approach, while also showing a new method for constructing such a matrix, through cherry tree sequences. Lastly, we prove that these two matrix-building algorithms are equivalent if the same perfect elimination ordering is being used.  ( 2 min )
    Modeling Regime Shifts in Multiple Time Series. (arXiv:2109.09692v3 [cs.LG] UPDATED)
    We investigate the problem of discovering and modeling regime shifts in an ecosystem comprising multiple time series known as co-evolving time series. Regime shifts refer to the changing behaviors exhibited by series at different time intervals. Learning these changing behaviors is a key step toward time series forecasting. While advances have been made, existing methods suffer from one or more of the following shortcomings: (1) failure to take relationships between time series into consideration for discovering regimes in multiple time series; (2) lack of an effective approach that models time-dependent behaviors exhibited by series; (3) difficulties in handling data discontinuities which may be informative. Most of the existing methods are unable to handle all of these three issues in a unified framework. This, therefore, motivates our effort to devise a principled approach for modeling interactions and time-dependency in co-evolving time series. Specifically, we model an ecosystem of multiple time series by summarizing the heavy ensemble of time series into a lighter and more meaningful structure called a \textit{mapping grid}. By using the mapping grid, our model first learns time series behavioral dependencies through a dynamic network representation, then learns the regime transition mechanism via a full time-dependent Cox regression model. The originality of our approach lies in modeling interactions between time series in regime identification and in modeling time-dependent regime transition probabilities, usually assumed to be static in existing work.  ( 2 min )
    DeepTag: A General Framework for Fiducial Marker Design and Detection. (arXiv:2105.13731v2 [cs.CV] UPDATED)
    A fiducial marker system usually consists of markers, a detection algorithm, and a coding system. The appearance of markers and the detection robustness are generally limited by the existing detection algorithms, which are hand-crafted with traditional low-level image processing techniques. Furthermore, a sophisticatedly designed coding system is required to overcome the shortcomings of both markers and detection algorithms. To improve the flexibility and robustness in various applications, we propose a general deep learning based framework, DeepTag, for fiducial marker design and detection. DeepTag not only supports detection of a wide variety of existing marker families, but also makes it possible to design new marker families with customized local patterns. Moreover, we propose an effective procedure to synthesize training data on the fly without manual annotations. Thus, DeepTag can easily adapt to existing and newly-designed marker families. To validate DeepTag and existing methods, beside existing datasets, we further collect a new large and challenging dataset where markers are placed in different view distances and angles. Experiments show that DeepTag well supports different marker families and greatly outperforms the existing methods in terms of both detection robustness and pose accuracy. Both code and dataset are available at https://herohuyongtao.github.io/research/publications/deep-tag/.  ( 2 min )
    THOR: Threshold-Based Ranking Loss for Ordinal Regression. (arXiv:2205.04864v1 [cs.LG])
    In this work, we present a regression-based ordinal regression algorithm for supervised classification of instances into ordinal categories. In contrast to previous methods, in this work the decision boundaries between categories are predefined, and the algorithm learns to project the input examples onto their appropriate scores according to these predefined boundaries. This is achieved by adding a novel threshold-based pairwise loss function that aims at minimizing the regression error, which in turn minimizes the Mean Absolute Error (MAE) measure. We implemented our proposed architecture-agnostic method using the CNN-framework for feature extraction. Experimental results on five real-world benchmarks demonstrate that the proposed algorithm achieves the best MAE results compared to state-of-the-art ordinal regression algorithms.  ( 2 min )
    Designing a Recurrent Neural Network to Learn a Motion Planner for High-Dimensional Inputs. (arXiv:2205.04799v1 [cs.RO])
    The use of machine learning in the self-driving industry has boosted a number of recent advancements. In particular, the usage of large deep learning models in the perception and prediction stack have proved quite successful, but there still lacks significant literature on the use of machine learning in the planning stack. The current state of the art in the planning stack often relies on fast constrained optimization or rule-based approaches. Both of these techniques fail to address a significant number of fundamental problems that would allow the vehicle to operate more similarly to that of human drivers. In this paper, we attempt to design a basic deep learning system to approach this problem. Furthermore, the main underlying goal of this paper is to demonstrate the potential uses of machine learning in the planning stack for autonomous vehicles (AV) and provide a baseline work for ongoing and future research.  ( 2 min )
    Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey. (arXiv:2205.04712v1 [cs.LG])
    The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.  ( 2 min )
    Secure and Private Source Coding with Private Key and Decoder Side Information. (arXiv:2205.05068v1 [cs.IT])
    The problem of secure source coding with multiple terminals is extended by considering a remote source whose noisy measurements are the correlated random variables used for secure source reconstruction. The main additions to the problem include 1) all terminals noncausally observe a noisy measurement of the remote source; 2) a private key is available to all legitimate terminals; 3) the public communication link between the encoder and decoder is rate-limited; 4) the secrecy leakage to the eavesdropper is measured with respect to the encoder input, whereas the privacy leakage is measured with respect to the remote source. Exact rate regions are characterized for a lossy source coding problem with a private key, remote source, and decoder side information under security, privacy, communication, and distortion constraints. By replacing the distortion constraint with a reliability constraint, we obtain the exact rate region also for the lossless case. Furthermore, the lossy rate region for scalar discrete-time Gaussian sources and measurement channels is established.  ( 2 min )
    Explainable Data Imputation using Constraints. (arXiv:2205.04731v1 [cs.AI])
    Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or singular value decomposition, require complete data. Many approaches impute numeric data and some do not consider dependency of attributes on other attributes, while some require human intervention and domain knowledge. We present a new algorithm for data imputation based on different data type values and their association constraints in data, which are not handled currently by any system. We show experimental results using different metrics comparing our algorithm with state of the art imputation techniques. Our algorithm not only imputes the missing values but also generates human readable explanations describing the significance of attributes used for every imputation.  ( 2 min )
    Sensible AI: Re-imagining Interpretability and Explainability using Sensemaking Theory. (arXiv:2205.05057v1 [cs.HC])
    Understanding how ML models work is a prerequisite for responsibly designing, deploying, and using ML-based systems. With interpretability approaches, ML can now offer explanations for its outputs to aid human understanding. Though these approaches rely on guidelines for how humans explain things to each other, they ultimately solve for improving the artifact -- an explanation. In this paper, we propose an alternate framework for interpretability grounded in Weick's sensemaking theory, which focuses on who the explanation is intended for. Recent work has advocated for the importance of understanding stakeholders' needs -- we build on this by providing concrete properties (e.g., identity, social context, environmental cues, etc.) that shape human understanding. We use an application of sensemaking in organizations as a template for discussing design guidelines for Sensible AI, AI that factors in the nuances of human cognition when trying to explain itself.  ( 2 min )
    ALLSH: Active Learning Guided by Local Sensitivity and Hardness. (arXiv:2205.04980v1 [cs.CL])
    Active learning, which effectively collects informative unlabeled data for annotation, reduces the demand for labeled data. In this work, we propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function. The proposed method generates data copies through local perturbations and selects data points whose predictive likelihoods diverge the most from their copies. We further empower our acquisition function by injecting the select-worst case perturbation. Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks. Furthermore, we observe consistent improvements over the baselines on the study of prompt selection in prompt-based few-shot learning. These experiments demonstrate that our acquisition guided by local sensitivity and hardness can be effective and beneficial for many NLP tasks.  ( 2 min )
    Tensor-based Collaborative Filtering With Smooth Ratings Scale. (arXiv:2205.05070v1 [cs.IR])
    Conventional collaborative filtering techniques don't take into consideration the effect of discrepancy in users' rating perception. Some users may rarely give 5 stars to items while others almost always assign 5 stars to the chosen item. Even if they had experience with the same items this systematic discrepancy in their evaluation style will lead to the systematic errors in the ability of recommender system to effectively extract right patterns from data. To mitigate this problem we introduce the ratings' similarity matrix which represents the dependency between different values of ratings on the population level. Hence, if on average the correlations between ratings exist, it is possible to improve the quality of proposed recommendations by off-setting the effect of either shifted down or shifted up users' rates.  ( 2 min )
    Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation. (arXiv:2102.07301v2 [cs.LG] UPDATED)
    We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state. We propose a new algorithm UCRL2-VTR, which can be seen as an extension of the UCRL2 algorithm with linear function approximation. We show that UCRL2-VTR with Bernstein-type bonus can achieve a regret of $\tilde{O}(d\sqrt{DT})$, where $d$ is the dimension of the feature mapping, $T$ is the horizon, and $\sqrt{D}$ is the diameter of the MDP. We also prove a matching lower bound $\tilde{\Omega}(d\sqrt{DT})$, which suggests that the proposed UCRL2-VTR is minimax optimal up to logarithmic factors. To the best of our knowledge, our algorithm is the first nearly minimax optimal RL algorithm with function approximation in the infinite-horizon average-reward setting.  ( 2 min )
    Reconstruction Enhanced Multi-View Contrastive Learning for Anomaly Detection on Attributed Networks. (arXiv:2205.04816v1 [cs.LG])
    Detecting abnormal nodes from attributed networks is of great importance in many real applications, such as financial fraud detection and cyber security. This task is challenging due to both the complex interactions between the anomalous nodes with other counterparts and their inconsistency in terms of attributes. This paper proposes a self-supervised learning framework that jointly optimizes a multi-view contrastive learning-based module and an attribute reconstruction-based module to more accurately detect anomalies on attributed networks. Specifically, two contrastive learning views are firstly established, which allow the model to better encode rich local and global information related to the abnormality. Motivated by the attribute consistency principle between neighboring nodes, a masked autoencoder-based reconstruction module is also introduced to identify the nodes which have large reconstruction errors, then are regarded as anomalies. Finally, the two complementary modules are integrated for more accurately detecting the anomalous nodes. Extensive experiments conducted on five benchmark datasets show our model outperforms current state-of-the-art models.  ( 2 min )
    Universal Caching. (arXiv:2205.04860v1 [cs.IT])
    In the learning literature, the performance of an online policy is commonly measured in terms of the static regret metric, which compares the cumulative loss of an online policy to that of an optimal benchmark in hindsight. In the definition of static regret, the benchmark policy remains fixed throughout the time horizon. Naturally, the resulting regret bounds become loose in non-stationary settings where fixed benchmarks often suffer from poor performance. In this paper, we investigate a stronger notion of regret minimization in the context of an online caching problem. In particular, we allow the action of the offline benchmark at any round to be decided by a finite state predictor containing arbitrarily many states. Using ideas from the universal prediction literature in information theory, we propose an efficient online caching policy with an adaptive sub-linear regret bound. To the best of our knowledge, this is the first data-dependent regret bound known for the universal caching problem. We establish this result by combining a recently-proposed online caching policy with an incremental parsing algorithm, e.g., Lempel-Ziv '78. Our methods also yield a simpler learning-theoretic proof of the improved regret bound as opposed to the more involved and problem-specific combinatorial arguments used in the earlier works.  ( 2 min )
    Weakly-supervised segmentation of referring expressions. (arXiv:2205.04725v1 [cs.CV])
    Visual grounding localizes regions (boxes or segments) in the image corresponding to given referring expressions. In this work we address image segmentation from referring expressions, a problem that has so far only been addressed in a fully-supervised setting. A fully-supervised setup, however, requires pixel-wise supervision and is hard to scale given the expense of manual annotation. We therefore introduce a new task of weakly-supervised image segmentation from referring expressions and propose Text grounded semantic SEGgmentation (TSEG) that learns segmentation masks directly from image-level referring expressions without pixel-level annotations. Our transformer-based method computes patch-text similarities and guides the classification objective during training with a new multi-label patch assignment mechanism. The resulting visual grounding model segments image regions corresponding to given natural language expressions. Our approach TSEG demonstrates promising results for weakly-supervised referring expression segmentation on the challenging PhraseCut and RefCOCO datasets. TSEG also shows competitive performance when evaluated in a zero-shot setting for semantic segmentation on Pascal VOC.  ( 2 min )
    Flow Completion Network: Inferring the Fluid Dynamics from Incomplete Flow Information using Graph Neural Networks. (arXiv:2205.04739v1 [physics.flu-dyn])
    This paper introduces a novel neural network -- the flow completion network (FCN) -- to infer the fluid dynamics, including the flow field and the force acting on the body, from the incomplete data based on Graph Convolution Attention Network. The FCN is composed of several graph convolution layers and spatial attention layers. It is designed to infer the velocity field and the vortex force contribution of the flow field when combined with the vortex force map (VFM) method. Compared with other neural networks adopted in fluid dynamics, the FCN is capable of dealing with both structured data and unstructured data. The performance of the proposed FCN is assessed by the computational fluid dynamics (CFD) data on the flow field around a circular cylinder. The force coefficients predicted by our model are validated against those obtained directly from CFD. Moreover, it is shown that our model effectively utilizes the existing flow field information and the gradient information simultaneously, giving a better performance than the traditional CNN-based and DNN-based models.  ( 2 min )
    AI training resources for GLAM: a snapshot. (arXiv:2205.04738v1 [cs.LG])
    We take a snapshot of current resources available for teaching and learning AI with a focus on the Galleries, Libraries, Archives and Museums (GLAM) community. The review was carried out in 2021 and 2022. The review provides an overview of material we identified as being relevant, offers a description of this material and makes recommendations for future work in this area.  ( 2 min )
    A spatial-temporal short-term traffic flow prediction model based on dynamical-learning graph convolution mechanism. (arXiv:2205.04762v1 [cs.LG])
    Short-term traffic flow prediction is a vital branch of the Intelligent Traffic System (ITS) and plays an important role in traffic management. Graph convolution network (GCN) is widely used in traffic prediction models to better deal with the graphical structure data of road networks. However, the influence weights among different road sections are usually distinct in real life, and hard to be manually analyzed. Traditional GCN mechanism, relying on manually-set adjacency matrix, is unable to dynamically learn such spatial pattern during the training. To deal with this drawback, this paper proposes a novel location graph convolutional network (Location-GCN). Location-GCN solves this problem by adding a new learnable matrix into the GCN mechanism, using the absolute value of this matrix to represent the distinct influence levels among different nodes. Then, long short-term memory (LSTM) is employed in the proposed traffic prediction model. Moreover, Trigonometric function encoding is used in this study to enable the short-term input sequence to convey the long-term periodical information. Ultimately, the proposed model is compared with the baseline models and evaluated on two real word traffic flow datasets. The results show our model is more accurate and robust on both datasets than other representative traffic prediction models.  ( 2 min )
    A Verification Framework for Certifying Learning-Based Safety-Critical Aviation Systems. (arXiv:2205.04590v1 [eess.SY])
    We present a safety verification framework for design-time and run-time assurance of learning-based components in aviation systems. Our proposed framework integrates two novel methodologies. From the design-time assurance perspective, we propose offline mixed-fidelity verification tools that incorporate knowledge from different levels of granularity in simulated environments. From the run-time assurance perspective, we propose reachability- and statistics-based online monitoring and safety guards for a learning-based decision-making model to complement the offline verification methods. This framework is designed to be loosely coupled among modules, allowing the individual modules to be developed using independent methodologies and techniques, under varying circumstances and with different tool access. The proposed framework offers feasible solutions for meeting system safety requirements at different stages throughout the system development and deployment cycle, enabling the continuous learning and assessment of the system product.  ( 2 min )
    On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis. (arXiv:2205.04641v1 [cs.LG])
    The establishment of the link between causality and unsupervised domain adaptation (UDA)/semi-supervised learning (SSL) has led to methodological advances in these learning problems in recent years. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL setting where we access m labeled source data and n unlabeled target data as training instances under a parametric probabilistic model. We study the learning performance (e.g., excess risk) of prediction in the target domain. Specifically, we distinguish two scenarios: the learning problem is called causal learning if the feature is the cause and the label is the effect, and is called anti-causal learning otherwise. We show that in causal learning, the excess risk depends on the size of the source sample at a rate of O(1/m) only if the labelling distribution between the source and target domains remains unchanged. In anti-causal learning, we show that the unlabeled data dominate the performance at a rate of typically O(1/n). Our analysis is based on the notion of potential outcome random variables and information theory. These results bring out the relationship between the data sample size and the hardness of the learning problem with different causal mechanisms.  ( 2 min )
    Surreal-GAN:Semi-Supervised Representation Learning via GAN for uncovering heterogeneous disease-related imaging patterns. (arXiv:2205.04523v1 [cs.LG])
    A plethora of machine learning methods have been applied to imaging data, enabling the construction of clinically relevant imaging signatures of neurological and neuropsychiatric diseases. Oftentimes, such methods don't explicitly model the heterogeneity of disease effects, or approach it via nonlinear models that are not interpretable. Moreover, unsupervised methods may parse heterogeneity that is driven by nuisance confounding factors that affect brain structure or function, rather than heterogeneity relevant to a pathology of interest. On the other hand, semi-supervised clustering methods seek to derive a dichotomous subtype membership, ignoring the truth that disease heterogeneity spatially and temporally extends along a continuum. To address the aforementioned limitations, herein, we propose a novel method, termed Surreal-GAN (Semi-SUpeRvised ReprEsentAtion Learning via GAN). Using cross-sectional imaging data, Surreal-GAN dissects underlying disease-related heterogeneity under the principle of semi-supervised clustering (cluster mappings from normal control to patient), proposes a continuously dimensional representation, and infers the disease severity of patients at individual level along each dimension. The model first learns a transformation function from normal control (CN) domain to the patient (PT) domain with latent variables controlling transformation directions. An inverse mapping function together with regularization on function continuity, pattern orthogonality and monotonicity was also imposed to make sure that the transformation function captures necessarily meaningful imaging patterns with clinical significance. We first validated the model through extensive semi-synthetic experiments, and then demonstrate its potential in capturing biologically plausible imaging patterns in Alzheimer's disease (AD).  ( 2 min )
    Affective Medical Estimation and Decision Making via Visualized Learning and Deep Learning. (arXiv:2205.04599v1 [cs.LG])
    With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer to as Visualized Learning for Machine Learning (VL4ML) that not only can serve to assist physicians and clinicians in making reasoned medical decisions, but it also allows to appreciate the uncertainty visualization, which could raise incertitude in making the appropriate classification or prediction. For the proof of concept, and to demonstrate the generalized nature of this visualized estimation approach, five different case studies are examined for different types of tasks including classification, regression, and longitudinal prediction. A survey analysis with more than 100 individuals is also conducted to assess users' feedback on this visualized estimation method. The experiments and the survey demonstrate the practical merits of the VL4ML that include: (1) appreciating visually clinical/medical estimations; (2) getting closer to the patients' preferences; (3) improving doctor-patient communication, and (4) visualizing the uncertainty introduced through the black box effect of the deployed ML algorithm. All the source codes are shared via a GitHub repository.  ( 2 min )
    Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks. (arXiv:2205.04491v1 [cs.LG])
    Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is not clear whether these theories really explain the performances of actual outputs of neural-network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby. These results support the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective. More generally, despite being limited to simple neural networks for now, our theories make a step forward in describing the practical properties of neural networks in mathematical terms.  ( 2 min )
    Differentiable Electron Microscopy Simulation: Methods and Applications for Visualization. (arXiv:2205.04464v1 [q-bio.QM])
    We propose a new microscopy simulation system that can depict atomistic models in a micrograph visual style, similar to results of physical electron microscopy imaging. This system is scalable, able to represent simulation of electron microscopy of tens of viral particles and synthesizes the image faster than previous methods. On top of that, the simulator is differentiable, both its deterministic as well as stochastic stages that form signal and noise representations in the micrograph. This notable property has the capability for solving inverse problems by means of optimization and thus allows for generation of microscopy simulations using the parameter settings estimated from real data. We demonstrate this learning capability through two applications: (1) estimating the parameters of the modulation transfer function defining the detector properties of the simulated and real micrographs, and (2) denoising the real data based on parameters trained from the simulated examples. While current simulators do not support any parameter estimation due to their forward design, we show that the results obtained using estimated parameters are very similar to the results of real micrographs. Additionally, we evaluate the denoising capabilities of our approach and show that the results showed an improvement over state-of-the-art methods. Denoised micrographs exhibit less noise in the tilt-series tomography reconstructions, ultimately reducing the visual dominance of noise in direct volume rendering of microscopy tomograms.  ( 2 min )
    Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures. (arXiv:2205.04713v1 [cs.LG])
    With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge datacenters, and cloud datacenters. On the other hand, recent machine learning efforts have provided viable solutions for model compression, pruning and quantization for heterogeneous environments; for a machine learning model, now we may easily find or even generate a series of models with different tradeoffs between accuracy and efficiency. We design and implement JellyBean, a framework for serving and optimizing machine learning inference workflows on heterogeneous infrastructures. Given service-level objectives (e.g., throughput, accuracy), JellyBean automatically selects the most cost-efficient models that met the accuracy target and decides how to deploy them across different tiers of infrastructures. Evaluations show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36% compared with state-of-the-art model selection and worker assignment solutions. JellyBean also outperforms prior ML serving systems (e.g., Spark on the cloud) up to 5x in serving costs.  ( 2 min )
    DNS based In-Browser Cryptojacking Detection. (arXiv:2205.04685v1 [cs.CR])
    The metadata aspect of Domain Names (DNs) enables us to perform a behavioral study of DNs and detect if a DN is involved in in-browser cryptojacking. Thus, we are motivated to study different temporal and behavioral aspects of DNs involved in cryptojacking. We use temporal features such as query frequency and query burst along with graph-based features such as degree and diameter, and non-temporal features such as the string-based to detect if a DNs is suspect to be involved in the in-browser cryptojacking. Then, we use them to train the Machine Learning (ML) algorithms over different temporal granularities such as 2 hours datasets and complete dataset. Our results show DecisionTrees classifier performs the best with 59.5% Recall on cryptojacked DN, while for unsupervised learning, K-Means with K=2 perform the best. Similarity analysis of the features reveals a minimal divergence between the cryptojacking DNs and other already known malicious DNs. It also reveals the need for improvements in the feature set of state-of-the-art methods to improve their accuracy in detecting in-browser cryptojacking. As added analysis, our signature-based analysis identifies that none-of-the Indian Government websites were involved in cryptojacking during October-December 2021. However, based on the resource utilization, we identify 10 DNs with different properties than others.  ( 2 min )
    Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. (arXiv:2205.04701v1 [cs.LG])
    In recommender systems, users always choose favorite items to rate, which results in data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) method and its variants have been widely studied and demonstrate superior performance. However, we show that DR methods are unstable to extremely small propensities and rely on extrapolations, resulting in sub-optimal performances. In this paper, we propose a stabilized doubly robust (SDR) estimator to address the above limitations while retaining double robustness. Theoretical analysis shows that SDR has bounded bias, variance and generalization error bound under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for SDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approach significantly outperforms the existing methods.  ( 2 min )
    Improving genetic risk prediction across diverse population by disentangling ancestry representations. (arXiv:2205.04673v1 [cs.LG])
    Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this bias, largely due to the prediction models being confounded by the underlying population structure, we propose a novel deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, particularly for admixed individuals.  ( 2 min )
    Real-time Forecasting of Time Series in Financial Markets Using Sequentially Trained Many-to-one LSTMs. (arXiv:2205.04678v1 [cs.LG])
    Financial markets are highly complex and volatile; thus, learning about such markets for the sake of making predictions is vital to make early alerts about crashes and subsequent recoveries. People have been using learning tools from diverse fields such as financial mathematics and machine learning in the attempt of making trustworthy predictions on such markets. However, the accuracy of such techniques had not been adequate until artificial neural network (ANN) frameworks were developed. Moreover, making accurate real-time predictions of financial time series is highly subjective to the ANN architecture in use and the procedure of training it. Long short-term memory (LSTM) is a member of the recurrent neural network family which has been widely utilized for time series predictions. Especially, we train two LSTMs with a known length, say $T$ time steps, of previous data and predict only one time step ahead. At each iteration, while one LSTM is employed to find the best number of epochs, the second LSTM is trained only for the best number of epochs to make predictions. We treat the current prediction as in the training set for the next prediction and train the same LSTM. While classic ways of training result in more error when the predictions are made further away in the test period, our approach is capable of maintaining a superior accuracy as training increases when it proceeds through the testing period. The forecasting accuracy of our approach is validated using three time series from each of the three diverse financial markets: stock, cryptocurrency, and commodity. The results are compared with those of an extended Kalman filter, an autoregressive model, and an autoregressive integrated moving average model.  ( 2 min )
    SuMe: A Dataset Towards Summarizing Biomedical Mechanisms. (arXiv:2205.04652v1 [cs.CL])
    Can language models read biomedical texts and explain the biomedical mechanisms discussed? In this work we introduce a biomedical mechanism summarization task. Biomedical studies often investigate the mechanisms behind how one entity (e.g., a protein or a chemical) affects another in a biological context. The abstracts of these publications often include a focused set of sentences that present relevant supporting statements regarding such relationships, associated experimental evidence, and a concluding sentence that summarizes the mechanism underlying the relationship. We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism. Using a small amount of manually labeled mechanism sentences, we train a mechanism sentence classifier to filter a large biomedical abstract collection and create a summarization dataset with 22k instances. We also introduce conclusion sentence generation as a pretraining task with 611k instances. We benchmark the performance of large bio-domain language models. We find that while the pretraining task help improves performance, the best model produces acceptable mechanism outputs in only 32% of the instances, which shows the task presents significant challenges in biomedical language understanding and summarization.  ( 2 min )
    On some studies of Fraud Detection Pipeline and related issues from the scope of Ensemble Learning and Graph-based Learning. (arXiv:2205.04626v1 [cs.LG])
    The UK anti-fraud charity Fraud Advisory Panel (FAP) in their review of 2016 estimates business costs of fraud at 144 billion, and its individual counterpart at 9.7 billion. Banking, insurance, manufacturing, and government are the most common industries affected by fraud activities. Designing an efficient fraud detection system could avoid losing the money; however, building this system is challenging due to many difficult problems, e.g.imbalanced data, computing costs, etc. Over the last three decades, there are various research relates to fraud detection but no agreement on what is the best approach to build the fraud detection system. In this thesis, we aim to answer some questions such as i) how to build a simplified and effective Fraud Detection System that not only easy to implement but also providing reliable results and our proposed Fraud Detection Pipeline is a potential backbone of the system and is easy to be extended or upgraded, ii) when to update models in our system (and keep the accuracy stable) in order to reduce the cost of updating process, iii) how to deal with an extreme imbalance in big data classification problem, e.g. fraud detection, since this is the gap between two difficult problems, iv) further, how to apply graph-based semi-supervised learning to detect fraudulent transactions.  ( 2 min )
    Risk Aversion In Learning Algorithms and an Application To Recommendation Systems. (arXiv:2205.04619v1 [cs.LG])
    Consider a bandit learning environment. We demonstrate that popular learning algorithms such as Upper Confidence Band (UCB) and $\varepsilon$-Greedy exhibit risk aversion: when presented with two arms of the same expectation, but different variance, the algorithms tend to not choose the riskier, i.e. higher variance, arm. We prove that $\varepsilon$-Greedy chooses the risky arm with probability tending to $0$ when faced with a deterministic and a Rademacher-distributed arm. We show experimentally that UCB also shows risk-averse behavior, and that risk aversion is present persistently in early rounds of learning even if the riskier arm has a slightly higher expectation. We calibrate our model to a recommendation system and show that algorithmic risk aversion can decrease consumer surplus and increase homogeneity. We discuss several extensions to other bandit algorithms, reinforcement learning, and investigate the impacts of algorithmic risk aversion for decision theory.  ( 2 min )
    Image2Gif: Generating Continuous Realistic Animations with Warping NODEs. (arXiv:2205.04519v1 [cs.CV])
    Generating smooth animations from a limited number of sequential observations has a number of applications in vision. For example, it can be used to increase number of frames per second, or generating a new trajectory only based on first and last frames, e.g. a motion of face emotions. Despite the discrete observed data (frames), the problem of generating a new trajectory is a continues problem. In addition, to be perceptually realistic, the domain of an image should not alter drastically through the trajectory of changes. In this paper, we propose a new framework, Warping Neural ODE, for generating a smooth animation (video frame interpolation) in a continuous manner, given two ("farther apart") frames, denoting the start and the end of the animation. The key feature of our framework is utilizing the continuous spatial transformation of the image based on the vector field, derived from a system of differential equations. This allows us to achieve the smoothness and the realism of an animation with infinitely small time steps between the frames. We show the application of our work in generating an animation given two frames, in different training settings, including Generative Adversarial Network (GAN) and with $L_2$ loss.  ( 2 min )
    How Does Frequency Bias Affect the Robustness of Neural Image Classifiers against Common Corruption and Adversarial Perturbations?. (arXiv:2205.04533v1 [cs.LG])
    Model robustness is vital for the reliable deployment of machine learning models in real-world applications. Recent studies have shown that data augmentation can result in model over-relying on features in the low-frequency domain, sacrificing performance against low-frequency corruptions, highlighting a connection between frequency and robustness. Here, we take one step further to more directly study the frequency bias of a model through the lens of its Jacobians and its implication to model robustness. To achieve this, we propose Jacobian frequency regularization for models' Jacobians to have a larger ratio of low-frequency components. Through experiments on four image datasets, we show that biasing classifiers towards low (high)-frequency components can bring performance gain against high (low)-frequency corruption and adversarial perturbation, albeit with a tradeoff in performance for low (high)-frequency corruption. Our approach elucidates a more direct connection between the frequency bias and robustness of deep learning models.  ( 2 min )
    Nightly Automobile Claims Prediction from Telematics-Derived Features: A Multilevel Approach. (arXiv:2205.04616v1 [cs.LG])
    In recent years it has become possible to collect GPS data from drivers and to incorporate this data into automobile insurance pricing for the driver. This data is continuously collected and processed nightly into metadata consisting of mileage and time summaries of each discrete trip taken, and a set of behavioral scores describing attributes of the trip (e.g, driver fatigue or driver distraction) so we examine whether it can be used to identify periods of increased risk by successfully classifying trips that occur immediately before a trip in which there was an incident leading to a claim for that driver. Identification of periods of increased risk for a driver is valuable because it creates an opportunity for intervention and, potentially, avoidance of a claim. We examine metadata for each trip a driver takes and train a classifier to predict whether \textit{the following trip} is one in which a claim occurs for that driver. By achieving a area under the receiver-operator characteristic above 0.6, we show that it is possible to predict claims in advance. Additionally, we compare the predictive power, as measured by the area under the receiver-operator characteristic of XGBoost classifiers trained to predict whether a driver will have a claim using exposure features such as driven miles, and those trained using behavioral features such as a computed speed score.  ( 2 min )
    A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing. (arXiv:2205.04559v1 [cs.CL])
    There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation - a mechanism for interpreting how important each input token is for a particular prediction. The validity of "attention as explanation" has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.  ( 2 min )
    Sentence-level Privacy for Document Embeddings. (arXiv:2205.04605v1 [cs.LG])
    User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high-dimensional, general-purpose $\epsilon$-SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding $\epsilon$-indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.  ( 2 min )
    Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances. (arXiv:2205.04586v1 [cs.LG])
    Calculating the most efficient schedule of work in a neural network compiler is a difficult task. There are many parameters to be accounted for that can positively or adversely affect that schedule depending on their configuration - How work is shared between distributed targets, the subdivision of tensors to fit in memory, toggling the enablement of optimizations, etc. Traditionally, neural network compilers determine how to set these values by building a graph of choices and choosing the path with minimal 'cost'. These choices and their corresponding costs are usually determined by an algorithm crafted by engineers with a deep knowledge of the target platform. However, when the amount of options available to a compiler is large, it is very difficult to ensure that these models consistently produce an optimal schedule for all scenarios, whilst still completing compilation in an acceptable timeframe. This paper presents 'VPUNN' - a neural network-based cost model trained on low-level task profiling that consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.  ( 2 min )
    KEMP: Keyframe-Based Hierarchical End-to-End Deep Model for Long-Term Trajectory Prediction. (arXiv:2205.04624v1 [cs.CV])
    Predicting future trajectories of road agents is a critical task for autonomous driving. Recent goal-based trajectory prediction methods, such as DenseTNT and PECNet, have shown good performance on prediction tasks on public datasets. However, they usually require complicated goal-selection algorithms and optimization. In this work, we propose KEMP, a hierarchical end-to-end deep learning framework for trajectory prediction. At the core of our framework is keyframe-based trajectory prediction, where keyframes are representative states that trace out the general direction of the trajectory. KEMP first predicts keyframes conditioned on the road context, and then fills in intermediate states conditioned on the keyframes and the road context. Under our general framework, goal-conditioned methods are special cases in which the number of keyframes equal to one. Unlike goal-conditioned methods, our keyframe predictor is learned automatically and does not require hand-crafted goal-selection algorithms. We evaluate our model on public benchmarks and our model ranked 1st on Waymo Open Motion Dataset Leaderboard (as of September 1, 2021).  ( 2 min )
    Calibrating for Class Weights by Modeling Machine Learning. (arXiv:2205.04613v1 [cs.LG])
    A much studied issue is the extent to which the confidence scores provided by machine learning algorithms are calibrated to ground truth probabilities. Our starting point is that calibration is seemingly incompatible with class weighting, a technique often employed when one class is less common (class imbalance) or with the hope of achieving some external objective (cost-sensitive learning). We provide a model-based explanation for this incompatibility and use our anthropomorphic model to generate a simple method of recovering likelihoods from an algorithm that is miscalibrated due to class weighting. We validate this approach in the binary pneumonia detection task of Rajpurkar, Irvin, Zhu, et al. (2017).  ( 2 min )
    Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation. (arXiv:2205.04610v1 [cs.LG])
    Research in machine learning fairness has historically considered a single binary demographic attribute; however, the reality is of course far more complicated. In this work, we grapple with questions that arise along three stages of the machine learning pipeline when incorporating intersectionality as multiple demographic attributes: (1) which demographic attributes to include as dataset labels, (2) how to handle the progressively smaller size of subgroups during model training, and (3) how to move beyond existing evaluation metrics when benchmarking model fairness for more subgroups. For each question, we provide thorough empirical evaluation on tabular datasets derived from the US Census, and present constructive recommendations for the machine learning community. First, we advocate for supplementing domain knowledge with empirical validation when choosing which demographic attribute labels to train on, while always evaluating on the full set of demographic attributes. Second, we warn against using data imbalance techniques without considering their normative implications and suggest an alternative using the structure in the data. Third, we introduce new evaluation metrics which are more appropriate for the intersectional setting. Overall, we provide substantive suggestions on three necessary (albeit not sufficient!) considerations when incorporating intersectionality into machine learning.  ( 2 min )
    Rethinking Fairness: An Interdisciplinary Survey of Critiques of Hegemonic ML Fairness Approaches. (arXiv:2205.04460v1 [cs.LG])
    This survey article assesses and compares existing critiques of current fairness-enhancing technical interventions into machine learning (ML) that draw from a range of non-computing disciplines, including philosophy, feminist studies, critical race and ethnic studies, legal studies, anthropology, and science and technology studies. It bridges epistemic divides in order to offer an interdisciplinary understanding of the possibilities and limits of hegemonic computational approaches to ML fairness for producing just outcomes for society's most marginalized. The article is organized according to nine major themes of critique wherein these different fields intersect: 1) how "fairness" in AI fairness research gets defined; 2) how problems for AI systems to address get formulated; 3) the impacts of abstraction on how AI tools function and its propensity to lead to technological solutionism; 4) how racial classification operates within AI fairness research; 5) the use of AI fairness measures to avoid regulation and engage in ethics washing; 6) an absence of participatory design and democratic deliberation in AI fairness considerations; 7) data collection practices that entrench "bias," are non-consensual, and lack transparency; 8) the predatory inclusion of marginalized groups into AI systems; and 9) a lack of engagement with AI's long-term social and ethical outcomes. Drawing from these critiques, the article concludes by imagining future ML fairness research directions that actively disrupt entrenched power dynamics and structural injustices in society.  ( 2 min )
    A Probabilistic Generative Model of Free Categories. (arXiv:2205.04545v1 [cs.AI])
    Applied category theory has recently developed libraries for computing with morphisms in interesting categories, while machine learning has developed ways of learning programs in interesting languages. Taking the analogy between categories and languages seriously, this paper defines a probabilistic generative model of morphisms in free monoidal categories over domain-specific generating objects and morphisms. The paper shows how acyclic directed wiring diagrams can model specifications for morphisms, which the model can use to generate morphisms. Amortized variational inference in the generative model then enables learning of parameters (by maximum likelihood) and inference of latent variables (by Bayesian inversion). A concrete experiment shows that the free category prior achieves competitive reconstruction performance on the Omniglot dataset.  ( 2 min )
    Towards a multi-stakeholder value-based assessment framework for algorithmic systems. (arXiv:2205.04525v1 [cs.LG])
    In an effort to regulate Machine Learning-driven (ML) systems, current auditing processes mostly focus on detecting harmful algorithmic biases. While these strategies have proven to be impactful, some values outlined in documents dealing with ethics in ML-driven systems are still underrepresented in auditing processes. Such unaddressed values mainly deal with contextual factors that cannot be easily quantified. In this paper, we develop a value-based assessment framework that is not limited to bias auditing and that covers prominent ethical principles for algorithmic systems. Our framework presents a circular arrangement of values with two bipolar dimensions that make common motivations and potential tensions explicit. In order to operationalize these high-level principles, values are then broken down into specific criteria and their manifestations. However, some of these value-specific criteria are mutually exclusive and require negotiation. As opposed to some other auditing frameworks that merely rely on ML researchers' and practitioners' input, we argue that it is necessary to include stakeholders that present diverse standpoints to systematically negotiate and consolidate value and criteria tensions. To that end, we map stakeholders with different insight needs, and assign tailored means for communicating value manifestations to them. We, therefore, contribute to current ML auditing practices with an assessment framework that visualizes closeness and tensions between values and we give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.  ( 2 min )
    Selectively Contextual Bandits. (arXiv:2205.04528v1 [cs.LG])
    Contextual bandits are widely used in industrial personalization systems. These online learning frameworks learn a treatment assignment policy in the presence of treatment effects that vary with the observed contextual features of the users. While personalization creates a rich user experience that reflect individual interests, there are benefits of a shared experience across a community that enable participation in the zeitgeist. Such benefits are emergent through network effects and are not captured in regret metrics typically employed in evaluating bandits. To balance these needs, we propose a new online learning algorithm that preserves benefits of personalization while increasing the commonality in treatments across users. Our approach selectively interpolates between a contextual bandit algorithm and a context-free multi-arm bandit and leverages the contextual information for a treatment decision only if it promises significant gains. Apart from helping users of personalization systems balance their experience between the individualized and shared, simplifying the treatment assignment policy by making it selectively reliant on the context can help improve the rate of learning in some cases. We evaluate our approach in a classification setting using public datasets and show the benefits of the hybrid policy.  ( 2 min )
    PinnerFormer: Sequence Modeling for User Representation at Pinterest. (arXiv:2205.04507v1 [cs.LG])
    Sequential models have become increasingly popular in powering personalized recommendation systems over the past several years. These approaches traditionally model a user's actions on a website as a sequence to predict the user's next action. While theoretically simplistic, these models are quite challenging to deploy in production, commonly requiring streaming infrastructure to reflect the latest user activity and potentially managing mutable data for encoding a user's hidden state. Here we introduce PinnerFormer, a user representation trained to predict a user's future long-term engagement using a sequential model of a user's recent actions. Unlike prior approaches, we adapt our modeling to a batch infrastructure via our new dense all-action loss, modeling long-term future actions instead of next action prediction. We show that by doing so, we significantly close the gap between batch user embeddings that are generated once a day and realtime user embeddings generated whenever a user takes an action. We describe our design decisions via extensive offline experimentation and ablations and validate the efficacy of our approach in A/B experiments showing substantial improvements in Pinterest's user retention and engagement when comparing PinnerFormer against our previous user representation. PinnerFormer is deployed in production as of Fall 2021.  ( 2 min )
    Insights into the origin of halo mass profiles from machine learning. (arXiv:2205.04474v1 [astro-ph.CO])
    The mass distribution of dark matter haloes is the result of the hierarchical growth of initial density perturbations through mass accretion and mergers. We use an interpretable machine-learning framework to provide physical insights into the origin of the spherically-averaged mass profile of dark matter haloes. We train a gradient-boosted-trees algorithm to predict the final mass profiles of cluster-sized haloes, and measure the importance of the different inputs provided to the algorithm. We find two primary scales in the initial conditions (ICs) that impact the final mass profile: the density at approximately the scale of the haloes' Lagrangian patch $R_L$ ($R\sim 0.7\, R_L$) and that in the large-scale environment ($R\sim 1.7~R_L$). The model also identifies three primary time-scales in the halo assembly history that affect the final profile: (i) the formation time of the virialized, collapsed material inside the halo, (ii) the dynamical time, which captures the dynamically unrelaxed, infalling component of the halo over its first orbit, (iii) a third, most recent time-scale, which captures the impact on the outer profile of recent massive merger events. While the inner profile retains memory of the ICs, this information alone is insufficient to yield accurate predictions for the outer profile. As we add information about the haloes' mass accretion history, we find a significant improvement in the predicted profiles at all radii. Our machine-learning framework provides novel insights into the role of the ICs and the mass assembly history in determining the final mass profile of cluster-sized haloes.  ( 2 min )
    OTFPF: Optimal Transport-Based Feature Pyramid Fusion Network for Brain Age Estimation with 3D Overlapped ConvNeXt. (arXiv:2205.04684v1 [cs.CV])
    Chronological age of healthy brain is able to be predicted using deep neural networks from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as an effective biomarker for detecting aging-related diseases or disorders. In this paper, we propose an end-to-end neural network architecture, referred to as optimal transport based feature pyramid fusion (OTFPF) network, for the brain age estimation with T1 MRIs. The OTFPF consists of three types of modules: Optimal Transport based Feature Pyramid Fusion (OTFPF) module, 3D overlapped ConvNeXt (3D OL-ConvNeXt) module and fusion module. These modules strengthen the OTFPF network's understanding of each brain's semi-multimodal and multi-level feature pyramid information, and significantly improve its estimation performances. Comparing with recent state-of-the-art models, the proposed OTFPF converges faster and performs better. The experiments with 11,728 MRIs aged 3-97 years show that OTFPF network could provide accurate brain age estimation, yielding mean absolute error (MAE) of 2.097, Pearson's correlation coefficient (PCC) of 0.993 and Spearman's rank correlation coefficient (SRCC) of 0.989, between the estimated and chronological ages. Widespread quantitative experiments and ablation experiments demonstrate the superiority and rationality of OTFPF network. The codes and implement details will be released on GitHub: https://github.com/ZJU-Brain/OTFPF after final decision.  ( 2 min )
    Variational Inference MPC using Normalizing Flows and Out-of-Distribution Projection. (arXiv:2205.04667v1 [cs.RO])
    We propose a Model Predictive Control (MPC) method for collision-free navigation that uses amortized variational inference to approximate the distribution of optimal control sequences by training a normalizing flow conditioned on the start, goal and environment. This representation allows us to learn a distribution that accounts for both the dynamics of the robot and complex obstacle geometries. We can then sample from this distribution to produce control sequences which are likely to be both goal-directed and collision-free as part of our proposed FlowMPPI sampling-based MPC method. However, when deploying this method, the robot may encounter an out-of-distribution (OOD) environment, i.e. one which is radically different from those used in training. In such cases, the learned flow cannot be trusted to produce low-cost control sequences. To generalize our method to OOD environments we also present an approach that performs projection on the representation of the environment as part of the MPC process. This projection changes the environment representation to be more in-distribution while also optimizing trajectory quality in the true environment. Our simulation results on a 2D double-integrator and a 3D 12DoF underactuated quadrotor suggest that FlowMPPI with projection outperforms state-of-the-art MPC baselines on both in-distribution and OOD environments, including OOD environments generated from real-world data.  ( 2 min )
    Crypto Pump and Dump via Deep Learning Techniques. (arXiv:2205.04646v1 [cs.LG])
    Despite the fact that cryptocurrencies themselves have experienced an astonishing rate of adoption over the last decade, cryptocurrency fraud detection is a heavily under-researched problem area. Of all fraudulent activity regarding cryptocurrencies, pump and dump schemes are some of the most common. Though some studies have been done on these kinds of scams in the stock market, the lack of labelled stock data and the volatility unique to the cryptocurrency space constrains the applicability of studies on the stock market toward this problem domain. Furthermore, the only work done in this space thus far has been either statistical in nature, or has been concerned with classical machine learning models such as random forest trees. We propose the novel application of two existing neural network architectures to this problem domain and show that deep learning solutions can significantly outperform all other existing pump and dump detection methods for cryptocurrencies.  ( 2 min )
    A 14uJ/Decision Keyword Spotting Accelerator with In-SRAM-Computing and On Chip Learning for Customization. (arXiv:2205.04665v1 [cs.AR])
    Keyword spotting has gained popularity as a natural way to interact with consumer devices in recent years. However, because of its always-on nature and the variety of speech, it necessitates a low-power design as well as user customization. This paper describes a low-power, energy-efficient keyword spotting accelerator with SRAM based in-memory computing (IMC) and on-chip learning for user customization. However, IMC is constrained by macro size, limited precision, and non-ideal effects. To address the issues mentioned above, this paper proposes bias compensation and fine-tuning using an IMC-aware model design. Furthermore, because learning with low-precision edge devices results in zero error and gradient values due to quantization, this paper proposes error scaling and small gradient accumulation to achieve the same accuracy as ideal model training. The simulation results show that with user customization, we can recover the accuracy loss from 51.08\% to 89.76\% with compensation and fine-tuning and further improve to 96.71\% with customization. The chip implementation can successfully run the model with only 14$uJ$ per decision. When compared to the state-of-the-art works, the presented design has higher energy efficiency with additional on-chip model customization capabilities for higher accuracy.  ( 2 min )
    An Edge-Cloud Integrated Framework for Flexible and Dynamic Stream Analytics. (arXiv:2205.04622v1 [cs.DC])
    With the popularity of Internet of Things (IoT), edge computing and cloud computing, more and more stream analytics applications are being developed including real-time trend prediction and object detection on top of IoT sensing data. One popular type of stream analytics is the recurrent neural network (RNN) deep learning model based time series or sequence data prediction and forecasting. Different from traditional analytics that assumes data to be processed are available ahead of time and will not change, stream analytics deals with data that are being generated continuously and data trend/distribution could change (aka concept drift), which will cause prediction/forecasting accuracy to drop over time. One other challenge is to find the best resource provisioning for stream analytics to achieve good overall latency. In this paper, we study how to best leverage edge and cloud resources to achieve better accuracy and latency for RNN-based stream analytics. We propose a novel edge-cloud integrated framework for hybrid stream analytics that support low latency inference on the edge and high capacity training on the cloud. We study the flexible deployment of our hybrid learning framework, namely edge-centric, cloud-centric and edge-cloud integrated. Further, our hybrid learning framework can dynamically combine inference results from an RNN model pre-trained based on historical data and another RNN model re-trained periodically based on the most recent data. Using real-world and simulated stream datasets, our experiments show the proposed edge-cloud deployment is the best among all three deployment types in terms of latency. For accuracy, the experiments show our dynamic learning approach performs the best among all learning approaches for all three concept drift scenarios.  ( 2 min )
    Robust Learning of Parsimonious Deep Neural Networks. (arXiv:2205.04650v1 [cs.LG])
    We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network during the early stages of training. Thus, the computational cost of subsequent training iterations, besides that of inference, is considerably reduced. Our method, based on variational inference principles, learns the posterior distribution of Bernoulli random variables multiplying the units/filters similarly to adaptive dropout. We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection in a way that the Bernoulli parameters practically converge to either 0 or 1 establishing a deterministic final network. Our algorithm is robust in the sense that it achieves consistent pruning levels and prediction accuracy regardless of weight initialization or the size of the starting network. We provide an analysis of its convergence properties establishing theoretical and practical pruning conditions. We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures. The simulations show that our method achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.  ( 2 min )
    Real-Time Wearable Gait Phase Segmentation For Running And Walking. (arXiv:2205.04668v1 [cs.LG])
    Previous gait phase detection as convolutional neural network (CNN) based classification task requires cumbersome manual setting of time delay or heavy overlapped sliding windows to accurately classify each phase under different test cases, which is not suitable for streaming Inertial-Measurement-Unit (IMU) sensor data and fails to adapt to different scenarios. This paper presents a segmentation based gait phase detection with only a single six-axis IMU sensor, which can easily adapt to both walking and running at various speeds. The proposed segmentation uses CNN with gait phase aware receptive field setting and IMU oriented processing order, which can fit to high sampling rate of IMU up to 1000Hz for high accuracy and low sampling rate down to 20Hz for real time calculation. The proposed model on the 20Hz sampling rate data can achieve average error of 8.86 ms in swing time, 9.12 ms in stance time and 96.44\% accuracy of gait phase detection and 99.97\% accuracy of stride detection. Its real-time implementation on mobile phone only takes 36 ms for 1 second length of sensor data.  ( 2 min )
    Deep Gait Tracking With Inertial Measurement Unit. (arXiv:2205.04666v1 [cs.LG])
    This paper presents a convolutional neural network based foot motion tracking with only six-axis Inertial-Measurement-Unit (IMU) sensor data. The presented approach can adapt to various walking conditions by adopting differential and window based input. The training data are further augmented by sliding and random window samplings on IMU sensor data to increase data diversity for better performance. The proposed approach fuses predictions of three dimensional output into one model. The proposed fused model can achieve average error of 2.30+-2.23 cm in X-axis, 0.91+-0.95 cm in Y-axis and 0.58+-0.52 cm in Z-axis.  ( 2 min )
  • Open

    Flexible variable selection in the presence of missing data. (arXiv:2202.12989v2 [stat.ME] UPDATED)
    In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose several nonparametric variable selection algorithms combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithms that achieve control of commonly used error rates. Through simulations, we show that our proposals have good operating characteristics and result in panels with higher classification performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed methods to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.  ( 2 min )
    Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path. (arXiv:2106.02073v4 [cs.LG] UPDATED)
    The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC.  ( 3 min )
    Nested conformal prediction and quantile out-of-bag ensemble methods. (arXiv:1910.10562v4 [stat.ME] UPDATED)
    Conformal prediction is a popular tool for providing valid prediction sets for classification and regression problems, without relying on any distributional assumptions on the data. While the traditional description of conformal prediction starts with a nonconformity score, we provide an alternate (but equivalent) view that starts with a sequence of nested sets and calibrates them to find a valid prediction set. The nested framework subsumes all nonconformity scores, including recent proposals based on quantile regression and density estimation. While these ideas were originally derived based on sample splitting, our framework seamlessly extends them to other aggregation schemes like cross-conformal, jackknife+ and out-of-bag methods. We use the framework to derive a new algorithm (QOOB, pronounced cube) that combines four ideas: quantile regression, cross-conformalization, ensemble methods and out-of-bag predictions. We develop a computationally efficient implementation of cross-conformal, that is also used by QOOB. In a detailed numerical investigation, QOOB performs either the best or close to the best on all simulated and real datasets. Code for QOOB is available at https://github.com/aigen/QOOB.  ( 2 min )
    Adaptation Strategies for Automated Machine Learning on Evolving Data. (arXiv:2006.06480v3 [cs.LG] UPDATED)
    Automated Machine Learning (AutoML) systems have been shown to efficiently build good models for new datasets. However, it is often not clear how well they can adapt when the data evolves over time. The main goal of this study is to understand the effect of data stream challenges such as concept drift on the performance of AutoML methods, and which adaptation strategies can be employed to make them more robust. To that end, we propose 6 concept drift adaptation strategies and evaluate their effectiveness on different AutoML approaches. We do this for a variety of AutoML approaches for building machine learning pipelines, including those that leverage Bayesian optimization, genetic programming, and random search with automated stacking. These are evaluated empirically on real-world and synthetic data streams with different types of concept drift. Based on this analysis, we propose ways to develop more sophisticated and robust AutoML techniques.  ( 2 min )
    Mean Estimation from One-Bit Measurements. (arXiv:1901.03403v4 [cs.IT] UPDATED)
    We consider the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release $n$ bits given a sample of size $n$, and for which there is no asymptotic penalty for quantization; second, an adaptive setting in which each bit is a function of the current observation and previously recorded bits, where we show that the optimal relative efficiency compared to the sample mean is precisely the efficiency of the median; lastly, we show that in a distributed setting where each bit is only a function of a local sample, no estimator can achieve optimal efficiency uniformly over the parameter space. We additionally complement our results in the adaptive setting by showing that \emph{one} round of adaptivity is sufficient to achieve optimal mean-square error.  ( 2 min )
    A High Throughput Generative Vector Autoregression Model for Stochastic Synapses. (arXiv:2205.05053v1 [cs.NE])
    By imitating the synaptic connectivity and plasticity of the brain, emerging electronic nanodevices offer new opportunities as the building blocks of neuromorphic systems. One challenge for largescale simulations of computational architectures based on emerging devices is to accurately capture device response, hysteresis, noise, and the covariance structure in the temporal domain as well as between the different device parameters. We address this challenge with a high throughput generative model for synaptic arrays that is based on a recently available type of electrical measurement data for resistive memory cells. We map this real world data onto a vector autoregressive stochastic process to accurately reproduce the device parameters and their cross-correlation structure. While closely matching the measured data, our model is still very fast; we provide parallelized implementations for both CPUs and GPUs and demonstrate array sizes above one billion cells and throughputs exceeding one hundred million weight updates per second, above the pixel rate of a 30 frames/s 4K video stream.  ( 2 min )
    Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting. (arXiv:2010.04456v6 [stat.ML] UPDATED)
    Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framework, a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model. The learning problem is carefully formulated such that the physical model explains as much of the data as possible, while the data-driven component only describes information that cannot be captured by the physical model, no more, no less. This not only provides the existence and uniqueness for this decomposition, but also ensures interpretability and benefits generalization. Experiments made on three important use cases, each representative of a different family of phenomena, i.e. reaction-diffusion equations, wave equations and the non-linear damped pendulum, show that APHYNITY can efficiently leverage approximate physical models to accurately forecast the evolution of the system and correctly identify relevant physical parameters. Code is available at https://github.com/yuan-yin/APHYNITY .  ( 3 min )
    A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data. (arXiv:2201.12020v2 [stat.ML] UPDATED)
    This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (EM) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or follows non-Gaussian distributions. To overcome this issue, a new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of Angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the EM framework with missing data thanks to its conditional distribution, which is shown to be a multivariate $t$-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.  ( 2 min )
    Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation. (arXiv:2102.07301v2 [cs.LG] UPDATED)
    We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state. We propose a new algorithm UCRL2-VTR, which can be seen as an extension of the UCRL2 algorithm with linear function approximation. We show that UCRL2-VTR with Bernstein-type bonus can achieve a regret of $\tilde{O}(d\sqrt{DT})$, where $d$ is the dimension of the feature mapping, $T$ is the horizon, and $\sqrt{D}$ is the diameter of the MDP. We also prove a matching lower bound $\tilde{\Omega}(d\sqrt{DT})$, which suggests that the proposed UCRL2-VTR is minimax optimal up to logarithmic factors. To the best of our knowledge, our algorithm is the first nearly minimax optimal RL algorithm with function approximation in the infinite-horizon average-reward setting.  ( 2 min )
    Modeling Regime Shifts in Multiple Time Series. (arXiv:2109.09692v3 [cs.LG] UPDATED)
    We investigate the problem of discovering and modeling regime shifts in an ecosystem comprising multiple time series known as co-evolving time series. Regime shifts refer to the changing behaviors exhibited by series at different time intervals. Learning these changing behaviors is a key step toward time series forecasting. While advances have been made, existing methods suffer from one or more of the following shortcomings: (1) failure to take relationships between time series into consideration for discovering regimes in multiple time series; (2) lack of an effective approach that models time-dependent behaviors exhibited by series; (3) difficulties in handling data discontinuities which may be informative. Most of the existing methods are unable to handle all of these three issues in a unified framework. This, therefore, motivates our effort to devise a principled approach for modeling interactions and time-dependency in co-evolving time series. Specifically, we model an ecosystem of multiple time series by summarizing the heavy ensemble of time series into a lighter and more meaningful structure called a \textit{mapping grid}. By using the mapping grid, our model first learns time series behavioral dependencies through a dynamic network representation, then learns the regime transition mechanism via a full time-dependent Cox regression model. The originality of our approach lies in modeling interactions between time series in regime identification and in modeling time-dependent regime transition probabilities, usually assumed to be static in existing work.  ( 2 min )
    On the power of conditional independence testing under model-X. (arXiv:2005.05506v4 [math.ST] UPDATED)
    For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we study the power of MX CI tests, yielding quantitative explanations for empirically observed phenomena and novel insights to guide the design of MX methodology. We show that any valid MX CI test must also be valid conditionally on Y and Z; this conditioning allows us to reformulate the problem as testing a point null hypothesis involving the conditional distribution of X. The Neyman-Pearson lemma then implies that the conditional randomization test (CRT) based on a likelihood statistic is the most powerful MX CI test against a point alternative. We also obtain a related optimality result for MX knockoffs. Switching to an asymptotic framework with arbitrarily growing covariate dimension, we derive an expression for the limiting power of the CRT against local semiparametric alternatives in terms of the prediction error of the machine learning algorithm on which its test statistic is based. Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error control under the assumption that only the first two moments of X given Z are known, a significant relaxation of the MX assumption.  ( 2 min )
    Gradient flows on graphons: existence, convergence, continuity equations. (arXiv:2111.09459v2 [math.PR] UPDATED)
    Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.  ( 2 min )
    Representing Hierarchical Structure by Using Cone Embedding. (arXiv:2102.08014v2 [cs.AI] UPDATED)
    Graph embedding is becoming an important method with applications in various areas, including social networks and knowledge graph completion. In particular, Poincar\'e embedding has been proposed to capture the hierarchical structure of graphs, and its effectiveness has been reported. However, most of the existing methods have isometric mappings in the embedding space, and the choice of the origin point can be arbitrary. This fact is not desirable when the distance from the origin is used as an indicator of hierarchy, as in the case of Poincar\'e embedding. In this paper, we propose cone embedding, embedding method in a metric cone, which solve these problems, and we gain further benefits: 1) we provide an indicator of hierarchical information that is both geometrically and intuitively natural to interpret, and 2) we can extract the hierarchical structure from a graph embedding output of other methods by learning additional one-dimensional parameters.  ( 2 min )
    Community Detection with a Subsampled Semidefinite Program. (arXiv:2102.01419v3 [math.OC] UPDATED)
    Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie \cite{mixon2020sketching} have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities.  ( 2 min )
    A Wasserstein distance approach for concentration of empirical risk estimates. (arXiv:1902.10709v4 [math.ST] UPDATED)
    This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as special cases well known risk measures from the finance literature such as conditional value at risk (CVaR), optimized certainty equivalent risk, spectral risk measures, utility-based shortfall risk, cumulative prospect theory (CPT) value, rank dependent expected utility and distorted risk measures. Two estimation schemes are considered, one for each class of risk measures. One estimation scheme involves applying the risk measure to the empirical distribution function formed from a collection of i.i.d. samples of the random variable (r.v.), while the second scheme involves applying the same procedure to a truncated sample. The bounds provided apply to three popular classes of distributions, namely sub-Gaussian, sub-exponential and heavy-tailed distributions. The bounds are derived by first relating the estimation error to the Wasserstein distance between the true and empirical distributions, and then using recent concentration bounds for the latter. Previous concentration bounds are available only for specific risk measures such as CVaR and CPT-value. The bounds derived in this paper are shown to either match or improve upon previous bounds in cases where they are available. The usefulness of the bounds is illustrated through an algorithm and the corresponding regret bound for a stochastic bandit problem involving a general risk measure from each of the two classes introduced in the paper.  ( 2 min )
    Differentially Private Learning with Adaptive Clipping. (arXiv:1905.03871v5 [cs.LG] UPDATED)
    Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.  ( 2 min )
    Random Forests for Change Point Detection. (arXiv:2205.04997v1 [stat.ME])
    We propose a novel multivariate nonparametric multiple change point detection method using classifiers. We construct a classifier log-likelihood ratio that uses class probability predictions to compare different change point configurations. We propose a computationally feasible search method that is particularly well suited for random forests, denoted by changeforest. However, the method can be paired with any classifier that yields class probability predictions, which we illustrate by also using a k-nearest neighbor classifier. We provide theoretical results motivating our choices. In a large simulation study, our proposed changeforest method achieves improved empirical performance compared to existing multivariate nonparametric change point detection methods. An efficient implementation of our method is made available for R, Python, and Rust users in the changeforest software package.  ( 2 min )
    Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression. (arXiv:1804.02605v4 [math.ST] UPDATED)
    Concentration inequalities form an essential toolkit in the study of high dimensional (HD) statistical methods. Most of the relevant statistics literature in this regard is based on sub-Gaussian or sub-exponential tail assumptions. In this paper, we first bring together various probabilistic inequalities for sums of independent random variables under much more general exponential type (namely sub-Weibull) tail assumptions. These results extract a part sub-Gaussian tail behavior in finite samples, matching the asymptotics governed by the central limit theorem, and are compactly represented in terms of a new Orlicz quasi-norm - the Generalized Bernstein-Orlicz norm - that typifies such tail behaviors. We illustrate the usefulness of these inequalities through the analysis of four fundamental problems in HD statistics. In the first two problems, we study the rate of convergence of the sample covariance matrix in terms of the maximum elementwise norm and the maximum k-sub-matrix operator norm which are key quantities of interest in bootstrap, HD covariance matrix estimation and HD inference. The third example concerns the restricted eigenvalue condition, required in HD linear regression, which we verify for all sub-Weibull random vectors through a unified analysis, and also prove a more general result related to restricted strong convexity in the process. In the final example, we consider the Lasso estimator for linear regression and establish its rate of convergence under much weaker than usual tail assumptions (on the errors as well as the covariates), while also allowing for misspecified models and both fixed and random design. To our knowledge, these are the first such results for Lasso obtained in this generality. The common feature in all our results over all the examples is that the convergence rates under most exponential tails match the usual ones under sub-Gaussian assumptions.  ( 3 min )
    Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. (arXiv:2205.04701v1 [cs.LG])
    In recommender systems, users always choose favorite items to rate, which results in data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) method and its variants have been widely studied and demonstrate superior performance. However, we show that DR methods are unstable to extremely small propensities and rely on extrapolations, resulting in sub-optimal performances. In this paper, we propose a stabilized doubly robust (SDR) estimator to address the above limitations while retaining double robustness. Theoretical analysis shows that SDR has bounded bias, variance and generalization error bound under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for SDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approach significantly outperforms the existing methods.  ( 2 min )
    Fixed-point iterations for several dissimilarity measure barycenters in the Gaussian case. (arXiv:2205.04806v1 [stat.CO])
    In target tracking and sensor fusion contexts it is not unusual to deal with a large number of Gaussian densities that encode the available information (multiple hypotheses), as in applications where many sensors, affected by clutter or multimodal noise, take measurements on the same scene. In such cases reduction procedures must be implemented, with the purpose of limiting the computational load. In some situations it is required to fuse all available information into a single hypothesis, and this is usually done by computing the barycenter of the set. However, such computation strongly depends on the chosen dissimilarity measure, and most often it must be performed making use of numerical methods, since in very few cases the barycenter can be computed analytically. Some issues, like the constraint on the covariance, that must be symmetric and positive definite, make it hard the numerical computation of the barycenter of a set of Gaussians. In this work, Fixed-Point Iterations (FPI) are presented for the computation of barycenters according to several dissimilarity measures, making up a useful toolbox for fusion/reduction of Gaussian sets in applications where specific dissimilarity measures are required.  ( 2 min )
    A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks. (arXiv:2205.05040v1 [cs.LG])
    In distributed training of deep neural networks or Federated Learning (FL), people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the FL setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup. The main technical difficulty lies in dealing with nonconvex loss function, non-Lipschitz continuous gradient, and skipping communication rounds simultaneously. In this paper, we explore a relaxed-smoothness assumption of the loss landscape which LSTM was shown to satisfy in previous works and design a communication-efficient gradient clipping algorithm. This algorithm can be run on multiple machines, where each machine employs a gradient clipping scheme and communicate with other machines after multiple steps of gradient-based updates. Our algorithm is proved to have $O\left(\frac{1}{N\epsilon^4}\right)$ iteration complexity for finding an $\epsilon$-stationary point, where $N$ is the number of machines. This indicates that our algorithm enjoys linear speedup. We prove this result by introducing novel analysis techniques of estimating truncated random variables, which we believe are of independent interest. Our experiments on several benchmark datasets and various scenarios demonstrate that our algorithm indeed exhibits fast convergence speed in practice and thus validates our theory.  ( 2 min )
    Turtle Score -- Similarity Based Developer Analyzer. (arXiv:2205.04876v1 [stat.ML])
    In day-to-day life, a highly demanding task for IT companies is to find the right candidates who fit the companies' culture. This research aims to comprehend, analyze and automatically produce convincing outcomes to find a candidate who perfectly fits right in the company. Data is examined and collected for each employee who works in the IT domain focusing on their performance measure. This is done based on various different categories which bring versatility and a wide view of focus. To this data, learner analysis is done using machine learning algorithms to obtain learner similarity and developer similarity in order to recruit people with identical working patterns. It's been proven that the efficiency and capability of a particular worker go higher when working with a person of a similar personality. Therefore this will serve as a useful tool for recruiters who aim to recruit people with high productivity. This is to say that the model designed will render the best outcome possible with high accuracy and an immaculate recommendation score.  ( 2 min )
    Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making. (arXiv:2205.04790v1 [stat.ML])
    Decision making algorithms, in practice, are often trained on data that exhibits a variety of biases. Decision-makers often aim to take decisions based on some ground-truth target that is assumed or expected to be unbiased, i.e., equally distributed across socially salient groups. In many practical settings, the ground-truth cannot be directly observed, and instead, we have to rely on a biased proxy measure of the ground-truth, i.e., biased labels, in the data. In addition, data is often selectively labeled, i.e., even the biased labels are only observed for a small fraction of the data that received a positive decision. To overcome label and selection biases, recent work proposes to learn stochastic, exploring decision policies via i) online training of new policies at each time-step and ii) enforcing fairness as a constraint on performance. However, the existing approach uses only labeled data, disregarding a large amount of unlabeled data, and thereby suffers from high instability and variance in the learned decision policies at different times. In this paper, we propose a novel method based on a variational autoencoder for practical fair decision-making. Our method learns an unbiased data representation leveraging both labeled and unlabeled data and uses the representations to learn a policy in an online process. Using synthetic data, we empirically validate that our method converges to the optimal (fair) policy according to the ground-truth with low variance. In real-world experiments, we further show that our training approach not only offers a more stable learning process but also yields policies with higher fairness as well as utility than previous approaches.  ( 2 min )
    KEMP: Keyframe-Based Hierarchical End-to-End Deep Model for Long-Term Trajectory Prediction. (arXiv:2205.04624v1 [cs.CV])
    Predicting future trajectories of road agents is a critical task for autonomous driving. Recent goal-based trajectory prediction methods, such as DenseTNT and PECNet, have shown good performance on prediction tasks on public datasets. However, they usually require complicated goal-selection algorithms and optimization. In this work, we propose KEMP, a hierarchical end-to-end deep learning framework for trajectory prediction. At the core of our framework is keyframe-based trajectory prediction, where keyframes are representative states that trace out the general direction of the trajectory. KEMP first predicts keyframes conditioned on the road context, and then fills in intermediate states conditioned on the keyframes and the road context. Under our general framework, goal-conditioned methods are special cases in which the number of keyframes equal to one. Unlike goal-conditioned methods, our keyframe predictor is learned automatically and does not require hand-crafted goal-selection algorithms. We evaluate our model on public benchmarks and our model ranked 1st on Waymo Open Motion Dataset Leaderboard (as of September 1, 2021).  ( 2 min )
    A Probabilistic Generative Model of Free Categories. (arXiv:2205.04545v1 [cs.AI])
    Applied category theory has recently developed libraries for computing with morphisms in interesting categories, while machine learning has developed ways of learning programs in interesting languages. Taking the analogy between categories and languages seriously, this paper defines a probabilistic generative model of morphisms in free monoidal categories over domain-specific generating objects and morphisms. The paper shows how acyclic directed wiring diagrams can model specifications for morphisms, which the model can use to generate morphisms. Amortized variational inference in the generative model then enables learning of parameters (by maximum likelihood) and inference of latent variables (by Bayesian inversion). A concrete experiment shows that the free category prior achieves competitive reconstruction performance on the Omniglot dataset.  ( 2 min )
    Matrix and graph representations of vine copula structures. (arXiv:2205.04783v1 [stat.ML])
    Vine copulas can efficiently model a large portion of probability distributions. This paper focuses on a more thorough understanding of their structures. We are building on well-known existing constructions to represent vine copulas with graphs as well as matrices. The graph representations include the regular, cherry and chordal graph sequence structures, which we show equivalence between. Importantly we also show that when a perfect elimination ordering of a vine structure is given, then it can always be uniquely represented with a matrix. O. M. N\'apoles has shown a way to represent them in a matrix, and we algorithmify this previous approach, while also showing a new method for constructing such a matrix, through cherry tree sequences. Lastly, we prove that these two matrix-building algorithms are equivalent if the same perfect elimination ordering is being used.  ( 2 min )
    Real-time Forecasting of Time Series in Financial Markets Using Sequentially Trained Many-to-one LSTMs. (arXiv:2205.04678v1 [cs.LG])
    Financial markets are highly complex and volatile; thus, learning about such markets for the sake of making predictions is vital to make early alerts about crashes and subsequent recoveries. People have been using learning tools from diverse fields such as financial mathematics and machine learning in the attempt of making trustworthy predictions on such markets. However, the accuracy of such techniques had not been adequate until artificial neural network (ANN) frameworks were developed. Moreover, making accurate real-time predictions of financial time series is highly subjective to the ANN architecture in use and the procedure of training it. Long short-term memory (LSTM) is a member of the recurrent neural network family which has been widely utilized for time series predictions. Especially, we train two LSTMs with a known length, say $T$ time steps, of previous data and predict only one time step ahead. At each iteration, while one LSTM is employed to find the best number of epochs, the second LSTM is trained only for the best number of epochs to make predictions. We treat the current prediction as in the training set for the next prediction and train the same LSTM. While classic ways of training result in more error when the predictions are made further away in the test period, our approach is capable of maintaining a superior accuracy as training increases when it proceeds through the testing period. The forecasting accuracy of our approach is validated using three time series from each of the three diverse financial markets: stock, cryptocurrency, and commodity. The results are compared with those of an extended Kalman filter, an autoregressive model, and an autoregressive integrated moving average model.  ( 2 min )
    Entropic CLT for Order Statistics. (arXiv:2205.04621v1 [cs.IT])
    It is well known that central order statistics exhibit a central limit behavior and converge to a Gaussian distribution as the sample size grows. This paper strengthens this known result by establishing an entropic version of the CLT that ensures a stronger mode of convergence using the relative entropy. In particular, an order $O(1/\sqrt{n})$ rate of convergence is established under mild conditions on the parent distribution of the sample generating the order statistics. To prove this result, ancillary results on order statistics are derived, which might be of independent interest.  ( 2 min )
    A Unified Bayesian Framework for Pricing Catastrophe Bond Derivatives. (arXiv:2205.04520v1 [q-fin.PR])
    Catastrophe (CAT) bond markets are incomplete and hence carry uncertainty in instrument pricing. As such various pricing approaches have been proposed, but none treat the uncertainty in catastrophe occurrences and interest rates in a sufficiently flexible and statistically reliable way within a unifying asset pricing framework. Consequently, little is known empirically about the expected risk-premia of CAT bonds. The primary contribution of this paper is to present a unified Bayesian CAT bond pricing framework based on uncertainty quantification of catastrophes and interest rates. Our framework allows for complex beliefs about catastrophe risks to capture the distinct and common patterns in catastrophe occurrences, and when combined with stochastic interest rates, yields a unified asset pricing approach with informative expected risk premia. Specifically, using a modified collective risk model -- Dirichlet Prior-Hierarchical Bayesian Collective Risk Model (DP-HBCRM) framework -- we model catastrophe risk via a model-based clustering approach. Interest rate risk is modeled as a CIR process under the Bayesian approach. As a consequence of casting CAT pricing models into our framework, we evaluate the price and expected risk premia of various CAT bond contracts corresponding to clustering of catastrophe risk profiles. Numerical experiments show how these clusters reveal how CAT bond prices and expected risk premia relate to claim frequency and loss severity.  ( 2 min )

  • Open

    IsaacGym vs. Brax?
    Is someone able to compare IsaacGym to Brax? I'm particularly interested in how efficient Brax is in comparison to IsaacGym when running on a strong Desktop PC. Some companies might have the resources to waste energy on inefficient CPU computations, I don't. So I'm not really interested in the distributed computing part of Brax, at least as far as simulation on CPUs is concerned. submitted by /u/felixcra [link] [comments]  ( 1 min )
    Question about the OpenAI 5 Dota 2 Bot
    I have been reading about this bot and it says it learned by playing an insane number of games that are equivalent to many years real time. My question is how it is possible to play this many games? The only way I had known of previously when it comes to reinforcement learning is to completely re-code the game so you don't have to deal with the animations and time that comes with actually playing the game and can speed it up. The only other way I could think of is if they got hundreds of thousands of different accounts to play Dota 2 but that's certainly not what they did. Does anyone know how they were able to speed the playing process up to play so many games? submitted by /u/TheGeniusSkipper [link] [comments]  ( 1 min )
    Callbacks in the __init__ method of a gym environment
    I am working with a multiagent gym environment and this is the init method: def __init__(self, world, reset_callback=None, reward_callback=None, observation_callback=None, info_callback=None, done_callback=None, post_step_callback=None, shared_viewer=True, discrete_action=True): Can someone explain what those callbacks are? I would like to send at each timestep the policy LSTM state to the env and I would like to do so by passing it in the step function along with the action. Does that make sense? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Books for engineers with ML experience?
    I've been working in ML and AI for a while, so I am familiar with deep neural networks and implementing them in pytorch and tensorflow. But every time I try to work on reinforcement learning side project I get end up with models that don't seem to learn anything useful. Part of it may just be the reward space and learning process is different than say training a conv network to extract features and label images which is a much easier to define problem. The state spaces, time component, and lack of clear labels just keeps tripping me up. I'd love a book I can work through to help me get through these issues, but don't really want to start with "what is a neural network" etc. submitted by /u/caedin8 [link] [comments]  ( 2 min )
    How to create a done and reward function for drone model?
    I want to implement DDPG algorithm on a drone in pybullet-drones simulator. In this simulator, I have following observations of a drone: X, Y, Z position in WORLD_FRAME (in meters, 3 values) Quaternion orientation in WORLD_FRAME (4 values) Roll, pitch and yaw angles in WORLD_FRAME (in radians, 3 values) The velocity vector in WORLD_FRAME (in m/s, 3 values) Angular velocity in WORLD_FRAME (3 values) Motors' speeds (in RPMs, 4 values) I want the drone to follow a circular trajectory. And simulation to be done when drone hits the ground. So how can I create a reward and done function for this problem? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    How to utilize the existing while training the agent
    Hi all, I am currently trying to teach my robot-manipulator how to reach a goal position by considering the overall energy consumption. Here, I would like to integrate the existing knowledge such as "try to avoid using q1, as it consumes a lot of energy". How could I initialize the training by utilizing this knowledge to boost the training speed? submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
    I am training a chess AI using self play. I want to train the AI and then save the model and make it play against the stockfish engine. If I set reset timestep to False will it still continue training the same model when I call model.learn again
    submitted by /u/madmax_wush [link] [comments]  ( 1 min )
    Help needed finding a "complex" RL environment to test an idea on
    Hi All, I am a master student in AI and I have developed a new training method and have been testing it on a "simple" environment implemented in Python/Gym/Stable-baselines3/Pygame. My supervisor has said that I need to test the idea on a more "complex" environment, preferably with a high dimensional state space (perhaps in the hundreds or thousands). Are there any "complex" environments that would be easy to quickly implement in Python/Gym/Pygame? I am running out of time so can't build the environment from scratch. Thanks alot in advance. submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Conv3D Gym Env?
    Hi all, I'm working on a RL project in the medical domain, and I want to use 3D grids as the observation for my agent. I have two questions: Does anyone know of any work that's been done using 3D grids and Conv3D? Preferably as a Gym env! I've seen a lot of work with 3D environments but the observation is usually just an image of the 3D env.. Would it be better to use mesh or voxels as the input? Thank you! submitted by /u/leozinho2r [link] [comments]  ( 1 min )
    Evaluation seed
    I train with PPO using 64 envs, each of them having a different seed, and then evaluate on 10 envs with different seeds. Can you tell me please if it is recommended to use in evaluation a sample of the seeds used in the train, or if they must be completely different? submitted by /u/FaithlessnessSuper46 [link] [comments]  ( 1 min )
  • Open

    Window to the Fourth Dimension | @artificial_artists on Instagram
    submitted by /u/atster11 [link] [comments]
    Blood Bridge
    submitted by /u/Hacknaut [link] [comments]
    Hey everyone! I'm not sure if this is the place for this, but I made an Instagram to post all the art I create on Dream by Wombo. Some of the pieces are really cool and I'm gonna keep posting on there so I figured I'd share the page with all of you in case you're interested! @Artificial_artists
    submitted by /u/atster11 [link] [comments]  ( 1 min )
    Data Science Interview Questions
    Preparing for a data scientist interview is tough since the questions you will be asked about data science are unclear. An interviewer may surprise you with a set of unexpected questions, regardless of how much job experience you have or what data science credentials you possess. Read more submitted by /u/ridamughal110 [link] [comments]
    Use of AI in the Manufacturing Industry
    With continuous advancements in Artificial Intelligence (AI), the manufacturers are spearheading to apply of it in their manufacturing processes to boost product quality, operational efficiency, workforce safety, and many more. Read more submitted by /u/ridamughal110 [link] [comments]
    Aiplague - Lost Underwater Garden (4K 60 FPS) Disco Diffusion
    submitted by /u/nalr00n [link] [comments]
    Doing body measurements with AI
    Hi r/artificial, I am developing an app in which I have to measure a person's measurements. I read somewhere that Artificial Intelligence is being used to measure houses etc. Is it already possible to use AI to do body measurements of people? For example, their arm or chest size? Also, I would like to know what the costs are of developing this feature. Can you guys help me pursue my idea? submitted by /u/notmycupofnft [link] [comments]  ( 1 min )
    AI Dream 44 - Epic Trippy Dream (Dragon Wave) 8x
    submitted by /u/LordPewPew777 [link] [comments]
    New AI Robot Chef R2-D-Chew Learns To 'Taste' To Improve Its Cooking
    submitted by /u/getrich_or_diemining [link] [comments]
    Overview of GraphSage for GNNs
    submitted by /u/aidev2040 [link] [comments]
    Question about Roko's Basilisk (I'm not here to create discussion, it's about a uni essay)
    Sorry if you are tired of this stupid basilisk, I don't believe in it, but I'm doing an essay about it and I have a question: Why does it only punish those who have heard about him? Why not punish everyone? I know it has to do about Pascal's wager and that "God doesn't punish those who don't know about them" but I still didn't really understand it. My google-fu is too weak for this one, I searched and didn't find anything. Thank you for taking the time for reading this! submitted by /u/Zholotoi [link] [comments]  ( 1 min )
    Ethical Concerns as a Result of AI!
    Machine learning is fast progressing thanks to neural networks for the following reasons: The number of data banks has exploded. The massive boost in processing power Algorithms for machine learning have vastly improved. Read full article: https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    BlobGAN enables object manipulation in an image
    submitted by /u/imapurplemango [link] [comments]  ( 1 min )
    How did you learn Ai?
    submitted by /u/SATelite_Media [link] [comments]  ( 1 min )
    Dynasties and Dystopia (made with starryai)
    submitted by /u/Losthel [link] [comments]  ( 1 min )
    Image Classification With TensorFlow.js
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Deepmind's latest AI has better visual understanding by combining a visual model and a language model
    submitted by /u/Zirius_Sadfaces [link] [comments]
    How do I create GAN imagery like Brock Hampton’s visualizers for ROADRUNNER?
    submitted by /u/AMORALESPLATA [link] [comments]
    MIT Researchers Create ‘ExSum’: A Mathematical Framework To Evaluate Explanations Of Machine Learning Models And Quantify How Well People Understand Them
    Machine learning is frequently referred to as a “black box” because the interactions between input and output become increasingly opaque as the model’s complexity grows. People’s understanding of these models is confined to how data is entered and final choices are made, and there is not much clarity in how these models make their predictions. While previous research has been done to determine how accurate the explanations given by these models are, the question of how quickly and reliably individuals grasp these models remains an unexplored territory. Interpretability methods are being developed to comprehend better the functioning of blackbox models, which is necessary for their reliable deployment. As a stepping stone in this field, researchers from the Computer Science and Artificial Intelligence Laboratory and Microsoft Research have created ground-breaking research by developing a mathematical framework called explanatory summary (ExSUM) for evaluating and quantifying how well individuals understand machine learning models. ExSUM exposes different flaws in existing practice and aids in developing accurate model knowledge by identifying the model’s easily missed features. Other explanations, such as human alignment, robustness, and counterfactual minimality, are also included in the framework. The findings will be presented at the Conference of the North American Chapter of the Association for Computational Linguistics. Continue Reading Paper: https://arxiv.org/pdf/2205.00130.pdf Github: https://yilunzhou.github.io/exsum/ submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    27 audio samples based on Kendrick Lamar's 'Top of the Morning' hook & VQGAN + Clip art
    submitted by /u/gloriousapplecart [link] [comments]
  • Open

    [D] Thoughts on Seldon and their MLOps?
    Context: I am an AI Product Manager at a start up, trying to clean up their ML architecture (moving from batch inference to real-time at the moment). Just wondering if anyone has worked with Seldon before their MLOps needs? They claimed their turnaround time could be a week (I am heavily skeptical of this), but they do claim to serve some bigger clients so just wondering on folks' thoughts! submitted by /u/baxter3851 [link] [comments]  ( 1 min )
    [P] Accelerated Inference with Optimum and Transformers Pipelines
    Hey there 👋 It’s Lewis here from the open-source team at Hugging Face 🤗. I'm excited to share the latest release of our Optimum library, which provides a suite of performance optimization tools to make Transformers run fast on accelerated hardware! This release introduces a new set of inference classes, which resemble the autoclasses from Transformers, but run an ONNX model on ONNX Runtime. These classes are API compatible with ordinary PyTorch / TensorFlow models, which means you can run inference using the `nifty pipeline() function from Transformers! Here's a quick example: from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForQuestionAnswering # Load ONNX checkpoint using the ONNX Runtime inference class model = ORTModelForQuestionAnswering.…  ( 1 min )
    [R] RWKV-v2-RNN : A parallelizable RNN with transformer-level LM performance, and without using attention
    Hi guys. I am an independent researcher and you might know me (BlinkDL) if you are in the EleutherAI discord. I have built a RNN with transformer-level performance, without using attention. Moreover it supports both sequential & parallel mode in inference and training. So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. https://github.com/BlinkDL/RWKV-LM I am training a L24-D1024 RWKV-v2-RNN LM (430M params) on the Pile (trained for 25B tokens as of now) with promising results, and it might reach GPT-Neo performance within 100B tokens. https://preview.redd.it/aftmbw4b4py81.png?width=852&format=png&auto=webp&s=56cf061dba0e90c79cbec7f37de86921bad3744c All of the trained models will be open-source. Inference is very fast (only matrix-vector multiplications, no matrix-matrix multiplications) even on CPUs, and I believe you can run a 1B params RWKV-v2-RNN with reasonable speed on your phone. It is inspired by Apple's AFT (https://arxiv.org/abs/2105.14103) with a number of my own tricks, such as: RNNify it, and use my CUDA kernel to speedup training (https://github.com/BlinkDL/RWKV-CUDA) Token-shift (https://github.com/BlinkDL/RWKV-LM#token-shift-time-shift-mixing) SmallInitEmb (https://github.com/BlinkDL/SmallInitEmb) which helps the embedding quality, and stabilizes Post-LN (which is what I am using). I also transferred some time-related parameters from a small model to a large model, to speed up the convergence. Basically the model learns to focus more on short-distance interactions in early layers, and long-distance interactions in later layers. https://preview.redd.it/ibk4ic0b6py81.png?width=865&format=png&auto=webp&s=78e4f794abd0fe25c8af8fd6634836a472e4120a Please feel free to ask questions :) submitted by /u/bo_peng [link] [comments]  ( 3 min )
    [D] Simple question - Does anyone know the size of BUFF dataset?
    Hi everyone, Upon reading [PIFuHD: Multi-Level pixel-aligned implicit function for high-resolution 3d human digitization](https://arxiv.org/abs/2004.00452), I recognized that they used [BUFF dataset](https://buff.is.tue.mpg.de/index.html) which is open only for academic purpose. ​ To access this data, I have to register academic e-mail and send some declaration papers.... ​ I just want to know how big this dataset is, maybe more than TB..? ​ Does anyone know how big this is? submitted by /u/Frequent-Desk9222 [link] [comments]  ( 1 min )
    [D]How to evaluate complementary datasets for ML models?
    Evaluating ML models is a fundamental task and subfield of the Machine Learning practice. On the other hand, I was not able to find any existing materials, guides, protocols, papers on how to proceed with evaluating/scoring complementary datasets (resulting multiple new features a.k.a feature set) when added to an existing model (and retrained with these new features added). This can be rather described as a with/without based comparative approach. Let's say that there is some kind of ensemble model (e.g. catboost, lighGBM) trained on some X_0 and y data with all the relevant testing metrics for that model type (classification or regression). We receive some new feature sets, X_1, X_2 ... X_n. Our goal is to get an overview whether [X_0 X_1], [X_0 X_2] .... [X_0 X_n] extended feature mat…  ( 2 min )
    [P] Coco Image Semantic Segmentation Dataset Generator Command Line Tool
    Hi everyone Just wanted to share a really small (micro) python command line utility that interfaces with `pycoco` library that allows you to generate tiff masks from the coco image dataset for training with semantic segmentation (i.e. UNet) where you can also filter by categories. Found it useful if you want to extract specific images from the Coco dataset for your own semantic segmentation project. Tiff images contain pixel values already representing class labels 0, 1, 2 etc... Hope someone else finds it useful! https://github.com/ralampay/pycocosegmentor submitted by /u/ralampay [link] [comments]  ( 1 min )
    [Research] Industry Analysis: The AI Fairness Toolkits Landscape
    A thoughtful approach to AI ethics is becoming increasingly important for all organizations deriving value from AI. We hope that by providing an overview of the top toolkits and resources that exist – starting with Fairness and Robustness – will help more companies adopt AI responsibly, with ethical principles at the core. https://www.borealisai.com/en/blog/industry-analysis-ai-fairness-toolkits-landscape/ submitted by /u/BorealisAI [link] [comments]  ( 1 min )
    [D] Data Scientist, ML Research Scientist, MLE, and Data Engineer
    I am new to non-academic research, but it seems like data scientists sometimes get unnecessary shat on. From talking to people and reading job descriptions, DS (especially at a startup) often seems closer to a Research Scientist than MLE, and especially DE. If what you want to do is research and model development it seems like DS is the best option next to RS - which seem to be the hardest positions to get. I was initially weary of taking a DS role that might lock me out of being considered for future RS positions, but after getting more exposure I'm doubting that is true. What do you think? I'm curious to hear more opinions because lots of people/companies seem to have different definitions for these roles. submitted by /u/Althonse [link] [comments]  ( 2 min )
    [R] NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
    submitted by /u/tobyoup [link] [comments]  ( 3 min )
    [P] Cluster and analyze text using both embedding and GPT models (interactive visualizations)
    Hi r/MachineLearning, You have likely seen umap visualizations of text embeddings before. I have been interested in imposing more information on such a figure to help in a text analysis flow. So I'd love to share two interactive exhibits with you: 1) Top 10,000 Hacker News articles of all time 2) Top 3,000 posts in Ask HN I found the following steps produce an interesting result: Embed the text (for this example, I'm using the titles of the top scoring 10K posts on hacker news) Cluster the embeddings (kmeans) UMAP and impose the clusters on the plot Extract keywords from the clusters (using cTFIDF from the awesome BERTopic, which now also supports kmeans!) Cluster Naming: The last step is work in progress, which is to have a generative language model assign a better name to a cluster (using the extracted keywords). The post is linked below. In addition to finding super interesting discussions on life insights, software career insight, content recommendations, it also contains a notebook and the dataset as well as the embeddings of the top 3K posts in Ask HN. ​ Combing For Insight in 10,000 Hacker News Posts With Text Clustering https://txt.cohere.ai/combing-for-insight-in-10-000-hacker-news-posts-with-text-clustering/ ​ Hope you find it useful! I'm fascinated about flows that use both generative and embeddings models as these are, for now, two stable NLP methods in a fast-changing field. Feel free to share any experiences or projects you've done along those lines! submitted by /u/jayalammar [link] [comments]  ( 2 min )
    [P] New lightweight library for head-gesture detection
    The Nodding Pigeon library provides a pre-trained model and a simple inference API for detecting head gestures in short videos. Under the hood, it uses Google MediaPipe for collecting the landmark features. For ML practitioners, this project is also an example of using generative data from a small base-dataset for model training. Please take a look! :) https://github.com/bhky/nodding-pigeon submitted by /u/xtorch501 [link] [comments]  ( 1 min )
    Reconstructing data [Discussion]
    Hey guys, as a physicist, I learned supervised ML on the fly. I optimize multi-variable functions to reproduce quantum mechanics simulations.I have a quite different problem here. I would like to reconstruct data. I have thousands of line of data, each line has several attributes. One of those attributes is missing for some data. It can only have a few possible values. I need to reconstruct it, based on the other attributes. I started to dig a bit the possible methods I could use. Could you give me a few advices, opinions, to get me started? I would greatly appreciate any income. PS: I code in python submitted by /u/ant_two [link] [comments]  ( 3 min )
    [R] DALLE-2 paper explained - model architecture, results and image manipulation
    Here is a video explaining the model architecture of the DALLE-2 architecture: https://youtu.be/Z8E3LxqE49M The paper title is, "Hierarchical Text-Conditional Image Generation with CLIP Latents" and the arxiv link to the paper is here: https://arxiv.org/abs/2204.06125 Official website is here: https://openai.com/dall-e-2 Hope its useful. submitted by /u/Combination-Fun [link] [comments]
    [P] Open-source to speed up deep learning inference by leveraging multiple optimization techniques (deep learning compilers, quantization, half precision, etc)
    Hi everyone, my name is Emile. After trying for a long time many tools to speed up AI inference, with some colleagues I have built an open-source library to bundle the best techniques you could find around and consolidated them into a single interface, this opensource library (today's release features the integration of most of these techniques). Let me know what you think of this OSS! Thank you This library is called nebullvm and takes your AI model as input and outputs an optimized version that runs 2-30 times faster on your hardware. Nebullvm tests multiple optimization techniques (deep learning compilers, quantization, sparsity, distillation, and more) to identify the optimal way to execute your AI model on your specific hardware. The library can speed up your model 2 to 10 times without loss of performance (thanks to deep learning compilers, e.g. tensorrt, openvino, mlir, etc.), or up to 20-30 times if you specify that you are willing to trade off a self-defined amount of accuracy/precision to achieve even lower latency and a lighter model (which might be useful for edge devices). This further acceleration is achieved by leveraging techniques that slightly modify the graph of the model to make it lighter, such as quantization, half-precision distillation, sparsity, etc. (more information about these techniques on nebullvm github https://github.com/nebuly-ai/nebullvm). The goal of nebullvm is to help other developers benefit from the most advanced inference optimization techniques without having to spend countless hours understanding, installing, testing and debugging these powerful technologies. I hope you enjoy the project, and let me know if you have any comments or advice or the acceleration you get on your model/hardware submitted by /u/emilec___ [link] [comments]  ( 2 min )
    [D] CIKM 2022 - Call for Demonstrations format
    As part of my PhD I have developed a web application in which one can apply different explainability methods on deep learning models and visualize the outcomes. The web application also addresses some specific use-cases which I (currently) think have not been addressed thus far. The application is in an early prototype stage, which is why I was considering submitting a demonstration paper to the CIKM demonstrations track, since it might seem to fit pretty well, when considering the format of papers that have been accepted in the previous years as demos. However, I am not familiar with how the presentation format looks like in the case that a demo paper gets accepted. Is it similar to a poster session where, in the case of a demo, one would stand beside their laptop and demonstrate the application to other attendees as they pass by, or does everyone with an accepted demo paper get a small time slot where they have 15-20 minutes to present the application and than receive questions? I would appreciate it if anyone who attended previous CIKM demo tracks could explain the format a bit. submitted by /u/purposeless_username [link] [comments]  ( 1 min )
    [D] Is it true that every algorithm to detect deepfakes can be used to generate better deepfakes?
    This position seems somehow rational but I seem to remember I read somewhere, a long time ago, that this was indeed not the case but I can't remember what the argument was based upon. Is the game of detecting/creating deepfakes really a cat and mouse game? submitted by /u/kugkfokj [link] [comments]  ( 3 min )
    [P] Deep RL Zoo - A collection of Deep RL algorithms implemented with PyTorch
    Hi guys, I recently created a new repo on Github, it contains a lists of RL agents to solve discrete action space problems like classic control and Atari games. It includes the most recent algorithms from DeepMind like Never Give Up and Agent57 (also not fully tested on Atari games yet because lack of hardware resources). Hope you will find it helpful. https://github.com/michaelnny/deep_rl_zoo The post was originally posted on r/reinforcementlearning a few days ago, but I though re-posting here might reach more people, have a good day! submitted by /u/Top_Serve_2348 [link] [comments]  ( 1 min )
  • Open

    Is Data Mesh Fool’s Gold? Creating a Business-centric Data Strategy
    After talking to many customers recently at Dell Technologies World, I am very, very (very!) concerned how many organizations are putting their Data Strategy success into the hands of Data Meshes. Sorry, but I think the way that IT organizations are thinking about a Data Mesh is fool’s gold. I think the data mesh (along… Read More »Is Data Mesh Fool’s Gold? Creating a Business-centric Data Strategy The post Is Data Mesh Fool’s Gold? Creating a Business-centric Data Strategy appeared first on Data Science Central.  ( 6 min )
    Top Technological HR Skills for the HR Professional
    The Human Resource Management department had a very challenging time due to the pandemic. Globally, the coronavirus pandemic has created a lot of havoc. People were more concerned about health and safety. This pandemic has given rise to new realities like: Social distancing Online (virtual) work  Self-isolation Lockdowns Shutdowns Quarantine Shelter-in-place Essential businesses These concepts gave… Read More »Top Technological HR Skills for the HR Professional The post Top Technological HR Skills for the HR Professional appeared first on Data Science Central.  ( 4 min )
  • Open

    Create video subtitles with Amazon Transcribe using this no-code workflow
    Subtitle creation on video content poses challenges no matter how big or small the organization. To address those challenges, Amazon Transcribe has a helpful feature that enables subtitle creation directly within the service. There is no machine learning (ML) or code writing required to get started. This post walks you through setting up a no-code […]  ( 9 min )
  • Open

    Overview of GraphSage for GNNs
    submitted by /u/aidev2040 [link] [comments]
    Image Classification With TensorFlow.js
    submitted by /u/RubiksCodeNMZ [link] [comments]
  • Open

    Here is how an R-Learning agent beats humans
    Human intuition is usually good at dealing with concepts like averages or mean-values, whereas it often performs poorly when it comes to…  ( 9 min )
  • Open

    Creator Karen X. Cheng Brings Keen AI for Design ‘In the NVIDIA Studio’
    The future of content creation is in AI. This week In the NVIDIA Studio, discover how AI-assisted painting is bringing a new level of inspiration to the next generation of artists. The post Creator Karen X. Cheng Brings Keen AI for Design ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Approximating a golden spiral with circular arcs
    The previous post included this image of a logarithm spiral passing through the corners of squares in a sequence of golden rectangles. The portion of the spiral in each square looks like a quarter of a circle. How well would circular arcs approximate the spiral? Very well. Here’s a plot. The circular arc inside the […] Approximating a golden spiral with circular arcs first appeared on John D. Cook.  ( 1 min )

  • Open

    [P] It's settled: AutoArima is a lot(!) faster and more accurate than FB-Prophet. Now you can replace it with just two lines of code without making changes to your pipeline
    We benchmarked on more than 100K series and show that you can improve MAPE forecast accuracy by 17% with 37x less computational time using Nixtlas StatsForecast. That's the difference between paying $10 or $296 on AWS. It’s time to overcome the false prophets. Check Nixtla's FB-Prophet adapter: https://github.com/Nixtla/statsforecast/tree/main/experiments/arima_prophet_adapter ​ https://preview.redd.it/fs3zqm4pcjy81.png?width=1280&format=png&auto=webp&s=326c82f04fae7df8934a434ba03fd5e683eadb99 The two lines you need submitted by /u/fedegarzar [link] [comments]  ( 1 min )
    [D] What are some good platforms to host new datasets published in ML conferences?
    I recently published an NLP paper which included a new dataset. What are some good platforms to host new datasets (our dataset is around 5GB)? Preferably, platforms which allow adding citation information and have low costs! I am currently checking out huggingface- but seems like there are some size and dataset type limitations in it. submitted by /u/shivamag99 [link] [comments]  ( 1 min )
    [D] Looking for the name of a particular regression technique
    This is a method I've read somewhere, but I can't remember the reference nor its name. You have (say) n = 500 points (say) in two dimensions. You draw a line for each pair of points. That is, you have n(n-1)/2 lines. If n is too large, you can sample a few thousands lines. The envelope of these line is your prediction band. The confidence level depends on the number of lines in your plot. It is entirely model-free, data driven. It also works if instead of a line, you use a 2-parameter curve (say you are dealing with logistic regression, rather than linear regression). Note that one of the lines will be the best fit according to L1 (that is, the one minimizing the absolute residual error), while the traditional regression line is a best L2 fit (minimizing square of residual error). The L1 version is more robust against outliers. This generalizes to d dimensions, with lines replaced by hyperplanes or hyper-surfaces. In this case, you need to look at all combinations of d points taken together to determines the all the hyperplanes. In practice, you only use a sample. How is the method referred to in the literature? What would be a good reference on the topic? submitted by /u/MLRecipes [link] [comments]  ( 1 min )
    [D] Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks
    Hi there, The Gradient has a new article that many of you might find interesting - Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks . The message-passing paradigm has been the “battle horse” of deep learning on graphs for several years, making graph neural networks a big success in a wide range of applications, from particle physics to protein design. From a theoretical viewpoint, it established the link to the Weisfeiler-Lehman hierarchy, allowing to analyse the expressive power of GNNs. We argue that the “node and edge-centric” mindset of current graph deep learning schemes imposes strong limitations that hinder future progress in the field. As an alternative, we propose physics-inspired “continuous” learning models that open up a new trove of tools from the fields of differential geometry, algebraic topology, and differential equations so far largely unexplored in graph ML. submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [N] Hugging Face raised $100M at $2B to double down on community, open-source & ethics
    👋 Hey there! Britney Muller here from Hugging Face. We've got some big news to share! Hugging Face Full Series C Announcement: https://huggingface.co/blog/series-c TechCrunch: https://techcrunch.com/2022/05/09/hugging-face-reaches-2-billion-valuation-to-build-the-github-of-machine-learning/ We want to have a positive impact on the AI field. We think the direction of more responsible AI is through openly sharing models, datasets, training procedures, evaluation metrics and working together to solve issues. We believe open source and open science bring trust, robustness, reproducibility, and continuous innovation. With this in mind, we are leading BigScience, a collaborative workshop around the study and creation of very large language models gathering more than 1,000 researchers of all backgrounds and disciplines. We are now training the world's largest open source multilingual language model 🌸 Over 10,000 companies are now using Hugging Face to build technology with machine learning. Their Machine Learning scientists, Data scientists and Machine Learning engineers have saved countless hours while accelerating their machine learning roadmaps with the help of our products and services. ⚠️ But there’s still a huge amount of work left to do. At Hugging Face, we know that Machine Learning has some important limitations and challenges that need to be tackled now like biases, privacy, and energy consumption. With openness, transparency & collaboration, we can foster responsible & inclusive progress, understanding & accountability to mitigate these challenges. Thanks to the new funding, we’ll be doubling down on research, open-source, products and responsible democratization of AI. submitted by /u/Britney-Ramona [link] [comments]  ( 3 min )
    [D] Are there any software engineers that switched into a machine learning role and found it a lot more stressful due to deadlines combined with the uncertainty of research?
    [D] Are there any software engineers that switched into a machine learning role and found it a lot more stressful due to deadlines combined with the uncertainty of research? I found that after switching to an ML role where I am both managing and improving data pipelines while experimenting with different ML models and doing feature engineering is a lot more stressful than a standard SWE job. As a SWE I had tasks, and I knew I was capable of doing all of them given some amount of time, and I just peacefully worked my way through it and had a lot of down time. With ML, I am continually trying new things, having to wait for the model to train and be evaluated, all the while to come back with incremental to zero improvements while still having to meet monthly or quarterly milestones. I don’t have that guarantee that I know I can complete a task given some amount of time. There’s so much uncertainty in the ML research process and the stress is making me want to switch back to SWE. The data pipelining and SWE parts are easy, it’s the contemplation of why my model isn’t working, and what is the next appropriate step to take to improve it that is difficult for me. 90% of the time the things I try don’t lead to any improvement. To clear things up, I don’t have a PhD in statistics or CS, only an BS/MS in CS so while I studied ML from the CS side, my statistical understanding of ML isn’t the best, which I believe may be contributing to this. Any one else deal with this? Is this a common feeling in this field? submitted by /u/Legitimate_Bison3756 [link] [comments]  ( 10 min )
    [D] Stats person in a team with all computer science colleagues....
    This is something I am really curious to know.... Are there guys out there with 'predominantly stats' background who are working in a machine learning / deep-learning team with all colleagues from computer science degrees? If so, could you please respond with your experience on: - What are the major challenges you are facing, if any. - Do you find the thought processes and approach of your colleagues to data, model etc. very different from yours? Does it cause any issues in your work? - Do you ever feel overwhelmed with software development practices, installation, code check-in issues etc. etc. ? For slightly better context: - By 'predominantly stats' background, I mean folks with primary degrees in stats, with basic training in coding in Python R etc but little or no training in stuff like ML engineering, building pipelines, code check-in check-out, developing code for very complex deep learning models with many classes, modules, libraries etc. etc. I'd really appreciate an in-depth answer. submitted by /u/sol2296 [link] [comments]  ( 2 min )
    Pros/cons of using model on different data without retraining? [D]
    Hi all, I'm looking into a classification problem with each observation being an aggregated geographical area, which I've trained some classifying algorithms on. The response variable is a binary indicator of poverty. My question is what are benefits or drawbacks of using my best model without re-training it on even lower disaggregated data or data for different time periods? Thanks for your thoughts! submitted by /u/Pickles654 [link] [comments]  ( 1 min )
    [D] What are some ways to improve your *science* for a top-tier ML conference?
    There seemed to be some consensus in the responses to yesterday’s post asking about ways to improve your paper for a top-tier ML conference—the primary thing you should do is ensure your research makes a top-tier worthy contribution. That itself seemed like it might be worthy of further exploration, so I thought I would continue the discussion: What traits characterize a high value research idea or insight? What are good strategies for pursuing one? submitted by /u/EmmyNoetherRing [link] [comments]  ( 2 min )
    [D] [R] Does anyone know how to implement the MostPop algorithm in recommendation systems?
    The MostPop algorithm is basically an algorithm that calculates metrics based on the most popular items at the time. This algorithm keeps popping up in papers but there is no description of it being used to calculate RMSE or such errors. Using the knowledge that I currently have I can calculate metrics such as Hit Rate (how many users have interacted with the item i) but is there any way that I can use a modification of this algorithm or this algorithm itself to predict what rating would be given to item "i" by user "x"? submitted by /u/GustaMusto [link] [comments]  ( 1 min )
    [D] IJCAI 2022 camera ready review notification.
    Has anyone got the email from the editors after submitting the camera ready ? submitted by /u/random_effective [link] [comments]
    [D] Is there any AI solution to "fill in" images between images?
    For example, I provide a 2d image of a stickman on the left of the frame and a second image of the stickman on the right of the frame (as if he was moving from left to right). Would there be an AI solution that would output me an image of this stickman in the middle of the frame? Like an AI solution that guesses that this is a stickman moving from left to right and creates the image in-between the 2 images I provided. ​ I'm asking this while thinking about creating an animation using AI where I just provide the "main" images/drawings and it "fills out" the rest. I google a lot and couldn't find anything suitable for this. I'd love hearing from you recommended libs/github profiles/notebooks that could help me in achieving this. I'm a musician and data science practitioner and I'd love if I could unite these two worlds in a video clip. submitted by /u/refrigerador82 [link] [comments]  ( 2 min )
    [D] Looking for ideas on NLP project
    Hello, I am trying to use text mining/text processing/NLP to apply certain changes to a big text file. More precisely, I have a big text file in html format (specifically a law). This law is then amended or changed by another law (a much smaller html file). For example, a law can be something like: The ten commandments Thou shalt have no other gods before me Thou shalt not make unto thee any graven image Thou shalt not take the name of the Lord thy God in vain etc... Then a change would be something like: Changes to the Ten commandments In The ten commandments, item 1 is changed to "Thou shalt not murder". Item 2 is removed. In item 3, after the word "vain", a comma is added followed by the sentence "unless the situation calls for it". The output needs to be: The ten commandments Thou shalt not murder Thou shalt not take the name of the Lord thy God in vain, unless the situation calls for it It's a really dumb example but that's the gist of it. The laws are on average 10 to 40 pages long while the changes are usually a couple pages long. Basically, the input would be one giant text file and either one or if possible multiple text files that contain changes to be made to the original file. The output would be an amended law with all the changes applied. The language of the text is not English or any other popular language. The changes are not written in a standardized way and there is also declination and typos to consider, which is why I would prefer using an ML approach rather than analysing the text by brute force. How would you go about approaching a problem like this? I have some experience with neural nets and machine learning, but very little in NLP so I am pretty lost. submitted by /u/WaferPlenty677 [link] [comments]  ( 1 min )
    [R] Happy to share my paper and Python code on efficient implementation of incremental proximal-point method for training machine-learning models.
    Paper: https://arxiv.org/abs/2205.01457 Code: https://github.com/alexshtf/inc_prox_pt Models are often trained using variants of the gradient update rule: xₜ₊₁ = xₜ-β∇ƒ(xₜ) where x is the model parameters vector, and ƒ is the cost function associated with the current (mini-batch of) training sample. This rule has another well-known interpretation - the proximal view: xₜ₊₁ = argmin { ƒ(xₜ) + ⟨∇ƒ(xₜ), x-xₜ⟩ + β/2 ‖x-xₜ‖² }, meaning "balance between minimizing a linear approx. of ƒ at xₜ and being close to xₜ". The step-size β determines the balance between the two opposing forces. So what happens if we replace the linear approximation with the cost function itself? We obtain xₜ₊₁ = argmin { ƒ(x) + β/2 ‖x-xₜ‖² }. This is called the "proximal operator of ƒ", and algorithms of this sort are called "proximal-point methods". So why not always exploit the cost itself, instead of just its slope? What's the catch? Depending on the complexity of ƒ, the above can be extremely challenging to compute. So why should we bother? Various works, for example https://arxiv.org/pdf/1903.08619.pdf by Asi and Duchi, show that such methods provide better resilience to step-size choice, which may make hyper-parameter tuning much cheaper - just guess a few step-sizes and you will get decent performance. Various extensions and works in online optimization also show nice stability properties, such as resilliance to temporal variability of the adversarial cost function. In this paper I developed efficient algorithms for computing the above step for a variety of functions ƒ useful in machine learning, and published a Python library based on PyTorch, which practitioners can use for some problem classes described in the paper, and researchers developing new proximal-point methods can use it to numerically demonstrate their brand-new "super proximal-point method with some cool momentum" on interesting problems. submitted by /u/alexsht1 [link] [comments]  ( 3 min )
    [D] What is the largest / most diverse GAN model currently out there?
    Hi community, I'm currently building a fork of StyleCLIP global directions which allows you to control multiple semantic parameters simultaneously to generate and edit an image with StyleGAN and CLIP in realtime. I want to showcase its potential as a design tool. Unfortunately, GAN weights are trained on very domain-specific (faces, cars, churches) data. This makes them inferior to modern diffusion models which I can use to generate whatever comes to mind. Although I know we won't have a GAN-based DALL-E counterpart anytime soon, I still would love to use my system with weights that can output a wide variety of things. So what are the biggest pretrained GAN models out there? Did anyone try to train one on LAION-400-M or similar? submitted by /u/_chinatown [link] [comments]  ( 1 min )
    [D] What are some ways to improve your paper for a top-tier ML conference?
    ^^^ I'm sure this is a really broad question, with no one-size-fits-all answer. I am an undergrad who is very new to this research field and still don't have a good sense of what constitutes a well-written, top-conference-worthy paper, so I would welcome any thoughts. From what I've seen, some of the best-written papers have the following: - A very clear statement of the impact of their contribution, usually near the beginning. This tells the reader why they should care about this work (and to the reviewer why they should advocate for acceptance :D). - Usually a compelling teaser image in the second column of the first page (this is also good for Twitter publicity in the future). - Not sure if this is specific to my subfield (NLP), but the main results are enumerated and bolded at the …  ( 5 min )
    Calling all Developers [P]
    Wanna help people get away from being frustrated by those pesky menu bots that run you through different options. Looking for an Alexa like AI voice assistant that can solve problems it has learned before and send the problems they don't know to customer service. If interested comment below or if you think its dumb also let me know all views welcome here. submitted by /u/Rhiquire [link] [comments]
  • Open

    Classification & Generation of Microscopy Images for Malaria via Artificial Neural Networks in Ghana
    submitted by /u/pasticciociccio [link] [comments]
    A.I. Farms?
    While looking through some dall-e 2 pictures I thought to myself, are there companies building ai farms in a way? Basically taking requests from other companies to train an ai to do a certain thing. That could be a money maker but as always someone probably already thought of that? submitted by /u/Swaggyswaggerson [link] [comments]
    Python AI for files
    So I got that exercise, where I get like 20 files with 5000 values on 6 columns (Acceleration X,Y,Z and Gyro X,Y,Z). The exercise is to code an AI for a smart watch, to be able to detect, if the user fell down or is just shaking his hand for example. Each of the 20 files either represents a fall, or no fall. I've to code and train that AI on python, but have no Idea how. I dont have the 20 files yet, but I have to code like a template, where I can just insert the files and the moment I get them, I should be able to train the AI. At first I've mistaken that exercise and thought, I would get only 6 Values for one row and coded the AI like you see it in the picture. But then I realized, if you fall, you dont have a specific of acceleration and gyro, but rather an intervall of values. Then I remembered, that each of the files describe a recording for 5 seconds, where every 10 miliseconds one value got recorded and written into the file. But I really dont have any Idea, how I compare files with 5000 values on 6 columns, or rather write an AI, which can predict, when getting such a file, if its a fall or not. submitted by /u/Specific-Feeling-666 [link] [comments]  ( 2 min )
    Here's a repository where I try to keep up with the most interesting research papers of 2022. It is a curated list of the latest breakthroughs in AI and Data Science by release date with a clear video explanation, link to a more in-depth article, and code (if applicable).
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Last Week in AI: New methods to detect deepfakes, AI for smarter farming, AI to detect heart problems from Apple Watch, and more!
    submitted by /u/regalalgorithm [link] [comments]
    This deep learning technique solves one of the tough challenges of robotics
    submitted by /u/bendee983 [link] [comments]
    Researchers at the Imperial College London have shown it is possible to perform artificial intelligence using tiny nanomagnets that interact like neurons in the brain.
    submitted by /u/DutchTechJunkie [link] [comments]  ( 1 min )
    Introduction to Tensorflow.js with Real-World Example
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Google AI Introduces A Method For Automating Inter- And Intra-Operator Parallelism For Distributed Deep Learning
    The memory capacity of single accelerators has swiftly overtaken the rapidly rising size of deep learning models in recent years. Earlier models, such as BERT (with a parameter size of 1GB), may scale across accelerators quickly by utilizing data parallelism, which duplicates model weights across accelerators while splitting and distributing training data. Recent huge models, such as GPT-3 (with a parameter size of 175GB), can only be scaled through model parallelism training, which involves partitioning a single model across many machines. While model parallelism solutions allow for the training of significant models, they are more challenging to implement since they must be tailored to the target neural networks and computing clusters. Megatron-LM, for example, splits weight matrices by rows or columns and then synchronizes the results across devices using a model parallelism technique. Different operators in a neural network are divided into several groups, and the input data is divided into micro-batches that are run in a pipelined method. Continue Reading Paper: https://arxiv.org/pdf/2201.12023.pdf Code: https://github.com/alpa-projects/alpa https://i.redd.it/6mu4lyihbey81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Does anybody has at least some idea what this ai is called 🐶😳😏✨
    submitted by /u/p0goniphaft111 [link] [comments]  ( 1 min )
  • Open

    PPO exploration on hidden layers
    Hi everyone i have a question about PPO, but it can be extended to any stochastic policy. Imagine I have a DNN in which the first part is an MLP with 2 outputs, + 1 more additional layer (whatever it is). Usually PPO does exploration on the DNN final output by sampling the stochastic policy. The problem is that for my particular application i need the final layer to be deterministic without exploration, still i would like to explore on the 2 MLP output neurons MLP. Does this mess up with PPO loss? (log probabilities and entropy). submitted by /u/Tricky_Ad_853 [link] [comments]  ( 1 min )
    Whats the best RL lib?
    Whats the best RL lib or in general whats the best approach to write distributed RL like R2D2 etc? 1) stable baselines 2) Ray RLlib 3) Other 4) Write from scratch? What do you guys use to implement distributed RL? submitted by /u/Defiant_Sun5579 [link] [comments]  ( 1 min )
    When you have a recurrent policy, should you reset the hidden states at the end of each episode or after each step?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Using CNNs in Reinforcement Learning
    For the last 18 months, I have been dabbling in DRL for both my MSc thesis and now also my PhD. I have however only worked with fixed size, 1 dimensional, state arrays for all of my problems and have used both baseline algorithms and adapted one version of SAC to better suit my problem needs. Note that even though I really like the main working principles of DRL and would love to dive deeper into the fundamentals, I do not have the required schooling in either mathematics or computer science to really afford going into that direction. Now currently I am working on a problem for which I think either CNNs or GNNs might be very useful but most of the implementations I can find only use the same network structure for the CNNs for the Actor and Critic (& Value) Networks, but they don't seem to be shared. Does this mean that all of these networks train different CNNs for similar purposes? Wouldn't it make sense if the CNN layers are shared for all of the networks, as in principle they share the same purpose: to transform the multi-dimensional input to a 1D input that contains sufficient information for the FC layers. It might be that I understand this completely incorrectly and this is already the case, and if its not the case, could someone explain to me why you wouldn't want to share these layers between the different networks? submitted by /u/jangroterder [link] [comments]  ( 3 min )
    Changing the learning rate for DQN agent in stable baselines 3
    Hi guys, I am pretty new to the field of reinforcement learning and I am going through it everyday. I made a custom environment with one state and 3 actions. I am trying to solve it with the dqn but it is taking too much time for example like 600k timesteps to get to the max result. So I thought of changing the learning rate. I could change the learning rate for PPO but it doesn't seem to be working for DQN. It would be really great if someone can help. Thanks in advance submitted by /u/last_2_brain_cells97 [link] [comments]  ( 1 min )
  • Open

    Logarithmic spiral
    I’ve seen an image similar to the following many times, but I don’t recall any source going into detail regarding how the spiral is constructed. This post will do just that. The previous post constructed iterated golden rectangles. We start with a golden rectangle and imagine chopping of first the blue, then the green, then […] Logarithmic spiral first appeared on John D. Cook.  ( 2 min )
    Iterated golden rectangles in detail
    I’ve seen the illustration of nesting golden rectangles many times, but I’ve never seen a presentation go into much detail. This post will go into more detail than usual, including Python code. Start with a golden rectangle in landscape mode. We’ll plot our rectangle with the lower left corner at the origin and the upper […] Iterated golden rectangles in detail first appeared on John D. Cook.  ( 2 min )
  • Open

    Utilize AWS AI services to automate content moderation and compliance
    The daily volume of third-party and user-generated content (UGC) across industries is increasing exponentially. Startups, social media, gaming, and other industries must ensure their customers are protected, while keeping operational costs down. Businesses in the broadcasting and media industries often find it difficult to efficiently add ratings to content pieces and formats to comply with […]  ( 6 min )
    Content moderation design patterns with AWS managed AI services
    User-generated content (UGC) grows exponentially, as well as the requirements and the cost to keep content and online communities safe and compliant. Modern web and mobile platforms fuel businesses and drive user engagement through social features, from startups to large organizations. Online community members expect safe and inclusive experiences where they can freely consume and […]  ( 8 min )
  • Open

    More Freedom on the Freeway: AI Lifts Malaysia’s Toll Barriers
    Working as an aerospace engineer in Malaysia, Chee How Lim dreamed of building a startup that could really take off. Today his company, Tapway, is riding a wave of computer vision and AI adoption in Southeast Asia. A call for help in 2019 with video analytics led to the Kuala Lumpur-based company’s biggest project to Read article > The post More Freedom on the Freeway: AI Lifts Malaysia’s Toll Barriers appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Q&A: Chris Rackauckas on the equations at the heart of practically everything
    Have a question about numerical differential equations? Odds are this CSAIL research affiliate has already addressed it.  ( 6 min )
  • Open

    Should the definition of Digital twins include simulation of complex systems?
    In the previous post, we discussed the various definitions of digital twins and we see that there are no shortage of them! But here is a question we discussed in class Should the definition of Digital twins include simulation of complex systems?  In my opinion, simulation is the raison d’être for digital twins let me explain (some of the ideas… Read More »Should the definition of Digital twins include simulation of complex systems? The post Should the definition of Digital twins include simulation of complex systems? appeared first on Data Science Central.  ( 2 min )
  • Open

    Introduction to Tensorflow.js with Real-World Example
    submitted by /u/RubiksCodeNMZ [link] [comments]
  • Open

    Static Analyzers in Python
    Static analyzers are tools that help you check your code without really running your code. The most basic form of static analyzers is the syntax highlighters in your favorite editors. If you need to compile your code (say, in C++), your compiler, such as LLVM, may also provide some static analyzer functions to warn you […] The post Static Analyzers in Python appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    Disentangled and Side-aware Unsupervised Domain Adaptation for Cross-dataset Subjective Tinnitus Diagnosis. (arXiv:2205.03230v1 [eess.SP])
    EEG-based tinnitus classification is a valuable tool for tinnitus diagnosis, research, and treatments. Most current works are limited to a single dataset where data patterns are similar. But EEG signals are highly non-stationary, resulting in model's poor generalization to new users, sessions or datasets. Thus, designing a model that can generalize to new datasets is beneficial and indispensable. To mitigate distribution discrepancy across datasets, we propose to achieve Disentangled and Side-aware Unsupervised Domain Adaptation (DSUDA) for cross-dataset tinnitus diagnosis. A disentangled auto-encoder is developed to decouple class-irrelevant information from the EEG signals to improve the classifying ability. The side-aware unsupervised domain adaptation module adapts the class-irrelevant information as domain variance to a new dataset and excludes the variance to obtain the class-distill features for the new dataset classification. It also align signals of left and right ears to overcome inherent EEG pattern difference. We compare DSUDA with state-of-the-art methods, and our model achieves significant improvements over competitors regarding comprehensive evaluation criteria. The results demonstrate our model can successfully generalize to a new dataset and effectively diagnose tinnitus.
    Predicting Loose-Fitting Garment Deformations Using Bone-Driven Motion Networks. (arXiv:2205.01355v2 [cs.GR] UPDATED)
    We present a learning algorithm that uses bone-driven motion networks to predict the deformation of loose-fitting garment meshes at interactive rates. Given a garment, we generate a simulation database and extract virtual bones from simulated mesh sequences using skin decomposition. At runtime, we separately compute low- and high-frequency deformations in a sequential manner. The low-frequency deformations are predicted by transferring body motions to virtual bones' motions, and the high-frequency deformations are estimated leveraging the global information of virtual bones' motions and local information extracted from low-frequency meshes. In addition, our method can estimate garment deformations caused by variations of the simulation parameters (e.g., fabric's bending stiffness) using an RBF kernel ensembling trained networks for different sets of simulation parameters. Through extensive comparisons, we show that our method outperforms state-of-the-art methods in terms of prediction accuracy of mesh deformations by about 20% in RMSE and 10% in Hausdorff distance and STED. The code and data are available at https://github.com/non-void/VirtualBones.
    Solar: $L_0$ solution path averaging for fast and accurate variable selection in high-dimensional data. (arXiv:2007.15707v3 [stat.ML] UPDATED)
    We propose a new variable selection algorithm, subsample-ordered least-angle regression (solar), and its coordinate descent generalization, solar-cd. Solar re-constructs lasso paths using the $L_0$ norm and averages the resulting solution paths across subsamples. Path averaging retains the ranking information of the informative variables while averaging out sensitivity to high dimensionality, improving variable selection stability, efficiency, and accuracy. We prove that: (i) with a high probability, path averaging perfectly separates informative variables from redundant variables on the average $L_0$ path; (ii) solar variable selection is consistent and accurate; and (iii) the probability that solar omits weak signals is controllable for finite sample size. We also demonstrate that: (i) solar yields, with less than $1/3$ of the lasso computation load, substantial improvements over lasso in terms of the sparsity (64-84\% reduction in redundant variable selection) and accuracy of variable selection; (ii) compared with the lasso safe/strong rule and variable screening, solar largely avoids selection of redundant variables and rejection of informative variables in the presence of complicated dependence structures; (iii) the sparsity and stability of solar conserves residual degrees of freedom for data-splitting hypothesis testing, improving the accuracy of post-selection inference on weak signals with limited $n$; (iv) replacing lasso with solar in bootstrap selection (e.g., bolasso or stability selection) produces a multi-layer variable ranking scheme that improves selection sparsity and ranking accuracy with the computation load of only one lasso realization; and (v) given the computation resources, solar bootstrap selection is substantially faster (98\% lower computation time) than the theoretical maximum speedup for parallelized bootstrap lasso (confirmed by Amdahl's law).
    A Framework for Evaluating Post Hoc Feature-Additive Explainers. (arXiv:2106.08376v2 [cs.LG] UPDATED)
    Many applications of data-driven models demand transparency of decisions, especially in health care, criminal justice, and other high-stakes environments. Modern trends in machine learning research have led to algorithms that are increasingly intricate to the degree that they are considered to be black boxes. In an effort to reduce the opacity of decisions, methods have been proposed to construe the inner workings of such models in a human-comprehensible manner. These post hoc techniques are described as being universal explainers - capable of faithfully augmenting decisions with algorithmic insight. Unfortunately, there is little agreement about what constitutes a "good" explanation. Moreover, current methods of explanation evaluation are derived from either subjective or proxy means. In this work, we propose a framework for the evaluation of post hoc explainers on ground truth that is directly derived from the additive structure of a model. We demonstrate the efficacy of the framework in understanding explainers by evaluating popular explainers on thousands of synthetic and several real-world tasks. The framework unveils that explanations may be accurate but misattribute the importance of individual features.
    Efficient and passive learning of networked dynamical systems driven by non-white exogenous inputs. (arXiv:2110.00852v3 [cs.LG] UPDATED)
    We consider a networked linear dynamical system with $p$ agents/nodes. We study the problem of learning the underlying graph of interactions/dependencies from observations of the nodal trajectories over a time-interval $T$. We present a regularized non-casual consistent estimator for this problem and analyze its sample complexity over two regimes: (a) where the interval $T$ consists of $n$ i.i.d. observation windows of length $T/n$ (restart and record), and (b) where $T$ is one continuous observation window (consecutive). Using the theory of $M$-estimators, we show that the estimator recovers the underlying interactions, in either regime, in a time-interval that is logarithmic in the system size $p$. To the best of our knowledge, this is the first work to analyze the sample complexity of learning linear dynamical systems \emph{driven by unobserved not-white wide-sense stationary (WSS) inputs}.  ( 2 min )
    Learning Optimal Conformal Classifiers. (arXiv:2110.09192v3 [cs.LG] UPDATED)
    Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in high-stake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's predictions, e.g., its probability estimates, to predict confidence sets containing the true class with a user-specified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper end-to-end. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on mini-batches during training. Compared to standard training, ConfTr reduces the average confidence set size (inefficiency) of state-of-the-art CP methods applied after training. Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.  ( 2 min )
    Differentially private training of residual networks with scale normalisation. (arXiv:2203.00324v2 [cs.LG] UPDATED)
    The training of neural networks with Differentially Private Stochastic Gradient Descent offers formal Differential Privacy guarantees but introduces accuracy trade-offs. In this work, we propose to alleviate these trade-offs in residual networks with Group Normalisation through a simple architectural modification termed ScaleNorm by which an additional normalisation layer is introduced after the residual block's addition operation. Our method allows us to further improve on the recently reported state-of-the art on CIFAR-10, achieving a top-1 accuracy of 82.5% ({\epsilon}=8.0) when trained from scratch.  ( 2 min )
    Active Offline Policy Selection. (arXiv:2106.10251v4 [cs.LG] UPDATED)
    This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation. Yet, large amounts of online interactions are often not possible in practice. To overcome this problem, we introduce active offline policy selection - a novel sequential decision approach that combines logged data with online interaction to identify the best policy. We use OPE estimates to warm start the online evaluation. Then, in order to utilize the limited environment interactions wisely we decide which policy to evaluate next based on a Bayesian optimization method with a kernel that represents policy similarity. We use multiple benchmarks, including real-world robotics, with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.  ( 2 min )
    What Makes A Good Fisherman? Linear Regression under Self-Selection Bias. (arXiv:2205.03246v1 [math.ST])
    In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selection criterion. For example, we might observe the highest output, the smallest output, or the median output of the $k$ models. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics and applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear. In the known-index case, we require poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate quite general selection criteria. In the more challenging unknown-index case, even the identifiability of the linear models (from infinitely many samples) was not known. We show three results in this case for the commonly studied $\max$ self-selection criterion: (1) we show that the linear models are indeed identifiable, (2) for general $k$ we provide an algorithm with poly$(d) \exp(\text{poly}(k))$ sample and time complexity to estimate the regression parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and time complexity.  ( 2 min )
    Federated Channel Learning for Intelligent Reflecting Surfaces With Fewer Pilot Signals. (arXiv:2205.03196v1 [eess.SP])
    Channel estimation is a critical task in intelligent reflecting surface (IRS)-assisted wireless systems due to the uncertainties imposed by environment dynamics and rapid changes in the IRS configuration. To deal with these uncertainties, deep learning (DL) approaches have been proposed. Previous works consider centralized learning (CL) approach for model training, which entails the collection of the whole training dataset from the users at the base station (BS), hence introducing huge transmission overhead for data collection. To address this challenge, this paper proposes a federated learning (FL) framework to jointly estimate both direct and cascaded channels in IRS-assisted wireless systems. We design a single convolutional neural network trained on the local datasets of the users without sending them to the BS. We show that the proposed FL-based channel estimation approach requires approximately 60% fewer pilot signals and it exhibits 12 times lower transmission overhead than CL, while maintaining satisfactory performance close to CL. In addition, it provides lower estimation error than the state-of-the-art DL-based schemes.  ( 2 min )
    Electrocardiographic Deep Learning for Predicting Post-Procedural Mortality. (arXiv:2205.03242v1 [eess.SP])
    Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was developed to leverage waveform signals from pre-operative ECGs to discriminate post-operative mortality. Model performance was assessed in a holdout internal test dataset and in two external hospital cohorts and compared with the Revised Cardiac Risk Index (RCRI) score. Results. In the derivation cohort, there were 1,452 deaths. The algorithm discriminates mortality with an AUC of 0.83 (95% CI 0.79-0.87) surpassing the discrimination of the RCRI score with an AUC of 0.67 (CI 0.61-0.72) in the held out test cohort. Patients determined to be high risk by the deep learning model's risk prediction had an unadjusted odds ratio (OR) of 8.83 (5.57-13.20) for post-operative mortality as compared to an unadjusted OR of 2.08 (CI 0.77-3.50) for post-operative mortality for RCRI greater than 2. The deep learning algorithm performed similarly for patients undergoing cardiac surgery with an AUC of 0.85 (CI 0.77-0.92), non-cardiac surgery with an AUC of 0.83 (0.79-0.88), and catherization or endoscopy suite procedures with an AUC of 0.76 (0.72-0.81). The algorithm similarly discriminated risk for mortality in two separate external validation cohorts from independent healthcare systems with AUCs of 0.79 (0.75-0.83) and 0.75 (0.74-0.76) respectively. Conclusion. The findings demonstrate how a novel deep learning algorithm, applied to pre-operative ECGs, can improve discrimination of post-operative mortality.
    Convex Analysis at Infinity: An Introduction to Astral Space. (arXiv:2205.03260v1 [math.OC])
    Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a compact extension of $\mathbb{R}^n$ to which such points at infinity have been added. Astral space is constructed to be as small as possible while still ensuring that all linear functions can be continuously extended to the new space. Although astral space includes all of $\mathbb{R}^n$, it is not a vector space, nor even a metric space. However, it is sufficiently well-structured to allow useful and meaningful extensions of concepts of convexity, conjugacy, and subdifferentials. We develop these concepts and analyze various properties of convex functions on astral space, including the detailed structure of their minimizers, exact characterizations of continuity, and convergence of descent algorithms.  ( 2 min )
    Implementation of a Binary Neural Network on a Passive Array of Magnetic Tunnel Junctions. (arXiv:2112.09159v2 [cs.ET] UPDATED)
    The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magnetic tunnel junctions (MTJs) to demonstrate neural network hardware inference based on passive arrays of MTJs. In general, transferring a trained network model to hardware for inference is confronted by degradation in performance due to device-to-device variations, write errors, parasitic resistance, and nonidealities in the substrate. To quantify the effect of these hardware realities, we benchmark 300 unique weight matrix solutions of a 2-layer perceptron to classify the Wine dataset for both classification accuracy and write fidelity. Despite device imperfections, we achieve software-equivalent accuracy of up to 95.3 % with proper tuning of network parameters in 15 x 15 MTJ arrays having a range of device sizes. The success of this tuning process shows that new metrics are needed to characterize the performance and quality of networks reproduced in mixed signal hardware.  ( 2 min )
    Fine-tuning wav2vec2 for speaker recognition. (arXiv:2109.15053v2 [cs.SD] UPDATED)
    This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with CE or AAM softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant, w2v2-aam, achieves a 1.88% EER on the extended voxceleb1 test set compared to 1.69% EER with an ECAPA-TDNN baseline. Code is available at https://github.com/nikvaessen/w2v2-speaker.  ( 2 min )
    Incremental Data-Uploading for Full-Quantum Classification. (arXiv:2205.03057v1 [quant-ph])
    The data representation in a machine-learning model strongly influences its performance. This becomes even more important for quantum machine learning models implemented on noisy intermediate scale quantum (NISQ) devices. Encoding high dimensional data into a quantum circuit for a NISQ device without any loss of information is not trivial and brings a lot of challenges. While simple encoding schemes (like single qubit rotational gates to encode high dimensional data) often lead to information loss within the circuit, complex encoding schemes with entanglement and data re-uploading lead to an increase in the encoding gate count. This is not well-suited for NISQ devices. This work proposes 'incremental data-uploading', a novel encoding pattern for high dimensional data that tackles these challenges. We spread the encoding gates for the feature vector of a given data point throughout the quantum circuit with parameterized gates in between them. This encoding pattern results in a better representation of data in the quantum circuit with a minimal pre-processing requirement. We show the efficiency of our encoding pattern on a classification task using the MNIST and Fashion-MNIST datasets, and compare different encoding methods via classification accuracy and the effective dimension of the model.  ( 2 min )
    Gaussian Processes for Missing Value Imputation. (arXiv:2204.04648v2 [stat.ML] UPDATED)
    Missing values are common in many real-life datasets. However, most of the current machine learning methods can not handle missing values. This means that they should be imputed beforehand. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs can be used to compute a predictive distribution for missing data. Here, we present a hierarchical composition of sparse GPs that is used to predict missing values at each dimension using all the variables from the other dimensions. We call the approach missing GP (MGP). MGP can be trained simultaneously to impute all observed missing values. Specifically, it outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP in one private clinical data set and four UCI datasets with a different percentage of missing values. We compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. The results obtained show a significantly better performance of MGP.  ( 2 min )
    Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-Ray Data. (arXiv:2205.03168v1 [cs.LG])
    Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against privacy attacks on DenseNet121 and ResNet50 network architectures. We simulated a federated environment by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the ROC curve (AUC) of 0.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of privacy breach, we integrated R\'enyi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets $\epsilon \in$ {1, 3, 6, 10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.94 for $\epsilon$ = 6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.  ( 2 min )
    Side-aware Meta-Learning for Cross-Dataset Listener Diagnosis with Subjective Tinnitus. (arXiv:2205.03231v1 [eess.SP])
    With the development of digital technology, machine learning has paved the way for the next generation of tinnitus diagnoses. Although machine learning has been widely applied in EEG-based tinnitus analysis, most current models are dataset-specific. Each dataset may be limited to a specific range of symptoms, overall disease severity, and demographic attributes; further, dataset formats may differ, impacting model performance. This paper proposes a side-aware meta-learning for cross-dataset tinnitus diagnosis, which can effectively classify tinnitus in subjects of divergent ages and genders from different data collection processes. Owing to the superiority of meta-learning, our method does not rely on large-scale datasets like conventional deep learning models. Moreover, we design a subject-specific training process to assist the model in fitting the data pattern of different patients or healthy people. Our method achieves a high accuracy of 73.8\% in the cross-dataset classification. We conduct an extensive analysis to show the effectiveness of side information of ears in enhancing model performance and side-aware meta-learning in improving the quality of the learned features.
    IMU Based Deep Stride Length Estimation With Self-Supervised Learning. (arXiv:2205.02977v1 [cs.LG])
    Stride length estimation using inertial measurement unit (IMU) sensors is getting popular recently as one representative gait parameter for health care and sports training. The traditional estimation method requires some explicit calibrations and design assumptions. Current deep learning methods suffer from few labeled data problem. To solve above problems, this paper proposes a single convolutional neural network (CNN) model to predict stride length of running and walking and classify the running or walking type per stride. The model trains its pretext task with self-supervised learning on a large unlabeled dataset for feature learning, and its downstream task on the stride length estimation and classification tasks with supervised learning with a small labeled dataset. The proposed model can achieve better average percent error, 4.78\%, on running and walking stride length regression and 99.83\% accuracy on running and walking classification, when compared to the previous approach, 7.44\% on the stride length estimation.  ( 2 min )
    Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation. (arXiv:2205.03195v1 [cs.LG])
    Simulation is a crucial tool for accelerating the development of autonomous vehicles. Making simulation realistic requires models of the human road users who interact with such cars. Such models can be obtained by applying learning from demonstration (LfD) to trajectories observed by cars already on the road. However, existing LfD methods are typically insufficient, yielding policies that frequently collide or drive off the road. To address this problem, we propose Symphony, which greatly improves realism by combining conventional policies with a parallel beam search. The beam search refines these policies on the fly by pruning branches that are unfavourably evaluated by a discriminator. However, it can also harm diversity, i.e., how well the agents cover the entire distribution of realistic behaviour, as pruning can encourage mode collapse. Symphony addresses this issue with a hierarchical approach, factoring agent behaviour into goal generation and goal conditioning. The use of such goals ensures that agent diversity neither disappears during adversarial training nor is pruned away by the beam search. Experiments on both proprietary and open Waymo datasets confirm that Symphony agents learn more realistic and diverse behaviour than several baselines.  ( 2 min )
    Trainable Wavelet Neural Network for Non-Stationary Signals. (arXiv:2205.03355v1 [cs.LG])
    This work introduces a wavelet neural network to learn a filter-bank specialized to fit non-stationary signals and improve interpretability and performance for digital signal processing. The network uses a wavelet transform as the first layer of a neural network where the convolution is a parameterized function of the complex Morlet wavelet. Experimental results, on both simplified data and atmospheric gravity waves, show the network is quick to converge, generalizes well on noisy data, and outperforms standard network architectures.
    Meta-Knowledge Transfer for Inductive Knowledge Graph Embedding. (arXiv:2110.14170v3 [cs.LG] UPDATED)
    Knowledge graphs (KGs) consisting of a large number of triples have become widespread recently, and many knowledge graph embedding (KGE) methods are proposed to embed entities and relations of a KG into continuous vector spaces. Such embedding methods simplify the operations of conducting various in-KG tasks (e.g., link prediction) and out-of-KG tasks (e.g., question answering). They can be viewed as general solutions for representing KGs. However, existing KGE methods are not applicable to inductive settings, where a model trained on source KGs will be tested on target KGs with entities unseen during model training. Existing works focusing on KGs in inductive settings can only solve the inductive relation prediction task. They can not handle other out-of-KG tasks as general as KGE methods since they don't produce embeddings for entities. In this paper, to achieve inductive knowledge graph embedding, we propose a model MorsE, which does not learn embeddings for entities but learns transferable meta-knowledge that can be used to produce entity embeddings. Such meta-knowledge is modeled by entity-independent modules and learned by meta-learning. Experimental results show that our model significantly outperforms corresponding baselines for in-KG and out-of-KG tasks in inductive settings.  ( 2 min )
    Synthetic Data -- what, why and how?. (arXiv:2205.03257v1 [cs.LG])
    This explainer document aims to provide an overview of the current state of the rapidly expanding work on synthetic data technologies, with a particular focus on privacy. The article is intended for a non-technical audience, though some formal definitions have been given to provide clarity to specialists. This article is intended to enable the reader to quickly become familiar with the notion of synthetic data, as well as understand some of the subtle intricacies that come with it. We do believe that synthetic data is a very useful tool, and our hope is that this report highlights that, while drawing attention to nuances that can easily be overlooked in its deployment.  ( 2 min )
    Application of Clustering Algorithms for Dimensionality Reduction in Infrastructure Resilience Prediction Models. (arXiv:2205.03316v1 [cs.LG])
    Recent studies increasingly adopt simulation-based machine learning (ML) models to analyze critical infrastructure system resilience. For realistic applications, these ML models consider the component-level characteristics that influence the network response during emergencies. However, such an approach could result in a large number of features and cause ML models to suffer from the `curse of dimensionality'. We present a clustering-based method that simultaneously minimizes the problem of high-dimensionality and improves the prediction accuracy of ML models developed for resilience analysis in large-scale interdependent infrastructure networks. The methodology has three parts: (a) generation of simulation dataset, (b) network component clustering, and (c) dimensionality reduction and development of prediction models. First, an interdependent infrastructure simulation model simulates the network-wide consequences of various disruptive events. The component-level features are extracted from the simulated data. Next, clustering algorithms are used to derive the cluster-level features by grouping component-level features based on their topological and functional characteristics. Finally, ML algorithms are used to develop models that predict the network-wide impacts of disruptive events using the cluster-level features. The applicability of the method is demonstrated using an interdependent power-water-transport testbed. The proposed method can be used to develop decision-support tools for post-disaster recovery of infrastructure networks.
    Physics-informed neural networks for PDE-constrained optimization and control. (arXiv:2205.03377v1 [cs.LG])
    A fundamental problem of science is designing optimal control policies that manipulate a given environment into producing a desired outcome. Control Physics-Informed Neural Networks simultaneously solve a given system state, and its respective optimal control, in a one-stage framework that conforms to physical laws of the system. Prior approaches use a two-stage framework that models and controls a system sequentially, whereas Control PINNs incorporates the required optimality conditions in its architecture and loss function. The success of Control PINNs is demonstrated by solving the following open-loop optimal control problems: (i) an analytical problem (ii) a one-dimensional heat equation, and (iii) a two-dimensional predator-prey problem.  ( 2 min )
    Generative Adversarial Neural Operators. (arXiv:2205.03017v1 [cs.LG])
    We propose the generative adversarial neural operator (GANO), a generative model paradigm for learning probabilities on infinite-dimensional function spaces. The natural sciences and engineering are known to have many types of data that are sampled from infinite-dimensional function spaces, where classical finite-dimensional deep generative adversarial networks (GANs) may not be directly applicable. GANO generalizes the GAN framework and allows for the sampling of functions by learning push-forward operator maps in infinite-dimensional spaces. GANO consists of two main components, a generator neural operator and a discriminator neural functional. The inputs to the generator are samples of functions from a user-specified probability measure, e.g., Gaussian random field (GRF), and the generator outputs are synthetic data functions. The input to the discriminator is either a real or synthetic data function. In this work, we instantiate GANO using the Wasserstein criterion and show how the Wasserstein loss can be computed in infinite-dimensional spaces. We empirically study GANOs in controlled cases where both input and output functions are samples from GRFs and compare its performance to the finite-dimensional counterpart GAN. We empirically study the efficacy of GANO on real-world function data of volcanic activities and show its superior performance over GAN. Furthermore, we find that for the function-based data considered, GANOs are more stable to train than GANs and require less hyperparameter optimization.
    Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation. (arXiv:2205.01133v2 [cs.CL] UPDATED)
    Multi-modal Machine Translation (MMT) enables the use of visual information to enhance the quality of translations. The visual information can serve as a valuable piece of context information to decrease the ambiguity of input sentences. Despite the increasing popularity of such a technique, good and sizeable datasets are scarce, limiting the full extent of their potential. Hausa, a Chadic language, is a member of the Afro-Asiatic language family. It is estimated that about 100 to 150 million people speak the language, with more than 80 million indigenous speakers. This is more than any of the other Chadic languages. Despite a large number of speakers, the Hausa language is considered low-resource in natural language processing (NLP). This is due to the absence of sufficient resources to implement most NLP tasks. While some datasets exist, they are either scarce, machine-generated, or in the religious domain. Therefore, there is a need to create training and evaluation data for implementing machine learning tasks and bridging the research gap in the language. This work presents the Hausa Visual Genome (HaVG), a dataset that contains the description of an image or a section within the image in Hausa and its equivalent in English. To prepare the dataset, we started by translating the English description of the images in the Hindi Visual Genome (HVG) into Hausa automatically. Afterward, the synthetic Hausa data was carefully post-edited considering the respective images. The dataset comprises 32,923 images and their descriptions that are divided into training, development, test, and challenge test set. The Hausa Visual Genome is the first dataset of its kind and can be used for Hausa-English machine translation, multi-modal research, and image description, among various other natural language processing and generation tasks.  ( 3 min )
    Atlas-powered deep learning (ADL) -- application to diffusion weighted MRI. (arXiv:2205.03210v1 [physics.med-ph])
    Deep learning has a great potential for estimating biomarkers in diffusion weighted magnetic resonance imaging (dMRI). Atlases, on the other hand, are a unique tool for modeling the spatio-temporal variability of biomarkers. In this paper, we propose the first framework to exploit both deep learning and atlases for biomarker estimation in dMRI. Our framework relies on non-linear diffusion tensor registration to compute biomarker atlases and to estimate atlas reliability maps. We also use nonlinear tensor registration to align the atlas to a subject and to estimate the error of this alignment. We use the biomarker atlas, atlas reliability map, and alignment error map, in addition to the dMRI signal, as inputs to a deep learning model for biomarker estimation. We use our framework to estimate fractional anisotropy and neurite orientation dispersion from down-sampled dMRI data on a test cohort of 70 newborn subjects. Results show that our method significantly outperforms standard estimation methods as well as recent deep learning techniques. Our method is also more robust to stronger measurement down-sampling factors. Our study shows that the advantages of deep learning and atlases can be synergistically combined to achieve unprecedented accuracy in biomarker estimation from dMRI data.
    Efficient Minimax Optimal Estimators For Multivariate Convex Regression. (arXiv:2205.03368v1 [math.ST])
    We study the computational aspects of the task of multivariate convex regression in dimension $d \geq 5$. We present the first computationally efficient minimax optimal (up to logarithmic factors) estimators for the tasks of (i) $L$-Lipschitz convex regression (ii) $\Gamma$-bounded convex regression under polytopal support. The proof of the correctness of these estimators uses a variety of tools from different disciplines, among them empirical process theory, stochastic geometry, and potential theory. This work is the first to show the existence of efficient minimax optimal estimators for non-Donsker classes that their corresponding Least Squares Estimators are provably minimax sub-optimal; a result of independent interest.  ( 2 min )
    Optimal Control as Variational Inference. (arXiv:2205.03279v1 [cs.LG])
    In this article we address the stochastic and risk sensitive optimal control problem probabilistically and decompose and solve the probabilistic models using principles from variational inference. We demonstrate how this culminates into two separate probabilistic inference procedures that allow to iteratively infer the deterministic optimal policy. More formally a sequence of belief policies, as a probabilistic proxy for the deterministic optimal policy, is specified through a fixed point iteration with the equilibrium point coinciding with the deterministic solution. These results re-establish the paradigm of Control as Inference, a concept explored and exploited originally by the Reinforcement Learning community anticipating deep rooted connections between optimal estimation and control. Although the Control as Inference paradigm already resulted in the development of several Reinforcement Learning algorithms, until now the underlying mechanism were only partially understood. For that very reason control as inference has not been well received by the control community. By exposing the underlying mechanism we aim to contribute to its general acceptance as a framework superseding optimal control. In order to exhibit its general relevance we discuss parallels with path integral control and discuss a wide range of possible applications.  ( 2 min )
    A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech Recognition. (arXiv:2205.03027v1 [cs.LG])
    Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-specific acoustic models are known to perform well in general, they are not easy to maintain when dialect-specific data is scarce and the number of dialects for each language is large. Therefore, a single unified acoustic model (AM) that generalizes well for many dialects has been in demand. In this paper, we propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM. Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously. We also propose a simple but effective training method to deal with unseen dialects. The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.  ( 2 min )
    Predicting Chemical Hazard across Taxa through Machine Learning. (arXiv:2110.03688v3 [q-bio.QM] UPDATED)
    We applied machine learning methods to predict chemical hazards focusing on fish acute toxicity across taxa. We analyzed the relevance of taxonomy and experimental setup, showing that taking them into account can lead to considerable improvements in the classification performance. We quantified the gain obtained throught the introduction of taxonomic and experimental information, compared to classification based on chemical information alone. We used our approach with standard machine learning models (K-nearest neighbors, random forests and deep neural networks), as well as the recently proposed Read-Across Structure Activity Relationship (RASAR) models, which were very successful in predicting chemical hazards to mammals based on chemical similarity. We were able to obtain accuracies of over 93% on datasets where, due to noise in the data, the maximum achievable accuracy was expected to be below 96%. The best performances were obtained by random forests and RASAR models. We analyzed metrics to compare our results with animal test reproducibility, and despite most of our models "outperform animal test reproducibility" as measured through recently proposed metrics, we showed that the comparison between machine learning performance and animal test reproducibility should be addressed with particular care. While we focused on fish mortality, our approach, provided that the right data is available, is valid for any combination of chemicals, effects and taxa.  ( 2 min )
    Altering backward pass gradients improves convergence. (arXiv:2111.12495v2 [cs.LG] UPDATED)
    In typical neural network training, the gradients in the backward pass is determined by the forward pass. As a result, the two stages are coupled. However, it is often seen that neural networks perform worse when gradients explode or decline. To address this, numerous approaches like Gradient Clipping (GC) and Adaptive Gradient Clipping (AGC) have been developed to enhance the gradient behaviour of networks without normalization layers during backward passes. These techniques decouple the backward and forward passes and modify the gradients adaptively. A possible drawback of clipping approaches is that they must be calculated for each weight tensor in each layer. We offer the PowerGrad Transform (PGT), a comparable approach that alters and enhances the gradient flow behaviour in the backward pass but is calculated only in the final softmax layer. It is very computationally efficient and outperforms both GC and AGC, resulting in improved performance in networks without batch normalization. PGT is easy to integrate into existing networks, requiring just a few lines of code, and significantly increases performance in non-BN ResNets. The impact is more pronounced on big datasets like as ImageNet, when networks do not fit all of the training data and there is some training headroom. PGT makes it possible for the network to better fit the training data while simultaneously improving its performance on the test set.  ( 2 min )
    Longitudinal cardio-respiratory fitness prediction through free-living wearable sensors. (arXiv:2205.03116v1 [cs.LG])
    Cardiorespiratory fitness is an established predictor of metabolic disease and mortality. Fitness is directly measured as maximal oxygen consumption (VO2max), or indirectly assessed using heart rate response to a standard exercise test. However, such testing is costly and burdensome, limiting its utility and scalability. Fitness can also be approximated using resting heart rate and self-reported exercise habits but with lower accuracy. Modern wearables capture dynamic heart rate data which, in combination with machine learning models, could improve fitness prediction. In this work, we analyze movement and heart rate signals from wearable sensors in free-living conditions from 11,059 participants who also underwent a standard exercise test, along with a longitudinal repeat cohort of 2,675 participants. We design algorithms and models that convert raw sensor data into cardio-respiratory fitness estimates, and validate these estimates' ability to capture fitness profiles in a longitudinal cohort over time while subjects engaged in real-world (non-exercise) behaviour. Additionally, we validate our methods with a third external cohort of 181 participants who underwent maximal VO2max testing, which is considered the gold standard measurement because it requires reaching one's maximum heart rate and exhaustion level. Our results show that the developed models yield a high correlation (r = 0.82, 95CI 0.80-0.83), when compared to the ground truth in a holdout sample. These models outperform conventional non-exercise fitness models and traditional bio-markers using measurements of normal daily living without the need for a specific exercise test. Additionally, we show the adaptability and applicability of this approach for detecting fitness change over time in the longitudinal subsample that repeated measurements after 7 years.  ( 2 min )
    Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss. (arXiv:2111.06916v2 [cs.CL] UPDATED)
    Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code-mixed language and more. Moreover, even after careful sampling and annotation of offensive content, there will always exist a significant class imbalance between offensive and non-offensive content. In this paper, we introduce a novel Code-Mixing Index (CMI) based focal loss which circumvents two challenges (1) code-mixing in languages (2) class imbalance problem for Dravidian language offense detection. We also replace the conventional dot product-based classifier with the cosine-based classifier which results in a boost in performance. Further, we use multilingual models that help transfer characteristics learnt across languages to work effectively with low resourced languages. It is also important to note that our model handles instances of mixed script (say usage of Latin and Dravidian-Tamil script) as well. To summarize, our model can handle offensive language detection in a low-resource, class imbalanced, multilingual and code-mixed setting.  ( 2 min )
    SKILL-IL: Disentangling Skill and Knowledge in Multitask Imitation Learning. (arXiv:2205.03130v1 [cs.LG])
    In this work, we introduce a new perspective for learning transferable content in multi-task imitation learning. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task. This allows improved training efficiency and better generalization over previously unseen combinations of skills in the same environment, and the same task in unseen environments. We used the proposed approach to train a disentangled agent for two different multi-task IL environments. In both cases we out-performed the SOTA by 30% in task success rate. We also demonstrated this for navigation on a real robot.  ( 2 min )
    Echocardiography Segmentation with Enforced Temporal Consistency. (arXiv:2112.02102v2 [eess.IV] UPDATED)
    Convolutional neural networks (CNN) have demonstrated their ability to segment 2D cardiac ultrasound images. However, despite recent successes according to which the intra-observer variability on end-diastole and end-systole images has been reached, CNNs still struggle to leverage temporal information to provide accurate and temporally consistent segmentation maps across the whole cycle. Such consistency is required to accurately describe the cardiac function, a necessary step in diagnosing many cardiovascular diseases. In this paper, we propose a framework to learn the 2D+time apical long-axis cardiac shape such that the segmented sequences can benefit from temporal and anatomical consistency constraints. Our method is a post-processing that takes as input segmented echocardiographic sequences produced by any state-of-the-art method and processes it in two steps to (i) identify spatio-temporal inconsistencies according to the overall dynamics of the cardiac sequence and (ii) correct the inconsistencies. The identification and correction of cardiac inconsistencies relies on a constrained autoencoder trained to learn a physiologically interpretable embedding of cardiac shapes, where we can both detect and fix anomalies. We tested our framework on 98 full-cycle sequences from the CAMUS dataset, which are available alongside this paper. Our temporal regularization method not only improves the accuracy of the segmentation across the whole sequences, but also enforces temporal and anatomical consistency.  ( 2 min )
    Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization. (arXiv:2205.02976v1 [cs.LG])
    We extend the idea underlying the success of green simulation assisted policy gradient (GS-PG) to partial historical trajectory reuse for infinite-horizon Markov Decision Processes (MDP). The existing GS-PG method was designed to learn from complete episodes or process trajectories, which limits its applicability to low-data environment and online process control. In this paper, the mixture likelihood ratio (MLR) based policy gradient estimation is used to leverage the information from historical state decision transitions generated under different behavioral policies. We propose a variance reduction experience replay (VRER) approach that can intelligently select and reuse most relevant transition observations, improve the policy gradient estimation accuracy, and accelerate the learning of optimal policy. Then we create a process control strategy by incorporating VRER with the state-of-the-art step-based policy optimization approaches such as actor-critic method and proximal policy optimizations. The empirical study demonstrates that the proposed policy gradient methodology can significantly outperform the existing policy optimization approaches.  ( 2 min )
    Transferring Adversarial Robustness Through Robust Representation Matching. (arXiv:2202.09994v2 [cs.LG] UPDATED)
    With the widespread use of machine learning, concerns over its security and reliability have become prevalent. As such, many have developed defenses to harden neural networks against adversarial examples, imperceptibly perturbed inputs that are reliably misclassified. Adversarial training in which adversarial examples are generated and used during training is one of the few known defenses able to reliably withstand such attacks against neural networks. However, adversarial training imposes a significant training overhead and scales poorly with model complexity and input dimension. In this paper, we propose Robust Representation Matching (RRM), a low-cost method to transfer the robustness of an adversarially trained model to a new model being trained for the same task irrespective of architectural differences. Inspired by student-teacher learning, our method introduces a novel training loss that encourages the student to learn the teacher's robust representations. Compared to prior works, RRM is superior with respect to both model performance and adversarial training time. On CIFAR-10, RRM trains a robust model $\sim 1.8\times$ faster than the state-of-the-art. Furthermore, RRM remains effective on higher-dimensional datasets. On Restricted-ImageNet, RRM trains a ResNet50 model $\sim 18\times$ faster than standard adversarial training.  ( 2 min )
    Bayesian Sample Size Prediction for Online Activity. (arXiv:2111.12157v2 [stat.ML] UPDATED)
    In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.  ( 2 min )
    Journaling Data for Daily PHQ-2 Depression Prediction and Forecasting. (arXiv:2205.03391v1 [cs.LG])
    Digital health applications are becoming increasingly important for assessing and monitoring the wellbeing of people suffering from mental health conditions like depression. A common target of said applications is to predict the results of self-assessed Patient-Health-Questionnaires (PHQ), indicating current symptom severity of depressive individuals. In this work, we explore the potential of using actively-collected data to predict and forecast daily PHQ-2 scores on a newly-collected longitudinal dataset. We obtain a best MAE of 1.417 for daily prediction of PHQ-2 scores, which specifically in the used dataset have a range of 0 to 12, using leave-one-subject-out cross-validation, as well as a best MAE of 1.914 for forecasting PHQ-2 scores using data from up to the last 7 days. This illustrates the additive value that can be obtained by incorporating actively-collected data in a depression monitoring application.  ( 2 min )
    Differentially Private Generalized Linear Models Revisited. (arXiv:2205.03014v1 [cs.LG])
    We study the problem of $(\epsilon,\delta)$-differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of $\tilde{O}\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^* \Vert^2}{(n\epsilon)^{2/3}},\frac{\sqrt{d}\Vert w^*\Vert^2}{n\epsilon}\right\}\right)$, where $n$ is the number of samples, $d$ is the dimension of the problem, and $w^*$ is the minimizer of the population risk. Apart from the dependence on $\Vert w^\ast\Vert$, our bound is essentially tight in all parameters. In particular, we show a lower bound of $\tilde{\Omega}\left(\frac{1}{\sqrt{n}} + {\min\left\{\frac{\Vert w^*\Vert^{4/3}}{(n\epsilon)^{2/3}}, \frac{\sqrt{d}\Vert w^*\Vert}{n\epsilon}\right\}}\right)$. We also revisit the previously studied case of Lipschitz losses [SSTT20]. For this case, we close the gap in the existing work and show that the optimal rate is (up to log factors) $\Theta\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^*\Vert}{\sqrt{n\epsilon}},\frac{\sqrt{\text{rank}}\Vert w^*\Vert}{n\epsilon}\right\}\right)$, where $\text{rank}$ is the rank of the design matrix. This improves over existing work in the high privacy regime. Finally, our algorithms involve a private model selection approach that we develop to enable attaining the stated rates without a-priori knowledge of $\Vert w^*\Vert$.  ( 2 min )
    Multichannel Synthetic Preictal EEG Signals to Enhance the Prediction of Epileptic Seizures. (arXiv:2205.03239v1 [eess.SP])
    Epilepsy is a chronic neurological disorder affecting 1\% of people worldwide, deep learning (DL) algorithms-based electroencephalograph (EEG) analysis provides the possibility for accurate epileptic seizure (ES) prediction, thereby benefiting patients suffering from epilepsy. To identify the preictal region that precedes the onset of seizure, a large number of annotated EEG signals are required to train DL algorithms. However, the scarcity of seizure onsets leads to significant insufficiency of data for training the DL algorithms. To overcome this data insufficiency, in this paper, we propose a preictal artificial signal synthesis algorithm based on a generative adversarial network to generate synthetic multichannel EEG preictal samples. A high-quality single-channel architecture, determined by visual and statistical evaluations, is used to train the generators of multichannel samples. The effectiveness of the synthetic samples is evaluated by comparing the ES prediction performances without and with synthetic preictal sample augmentation. The leave-one-seizure-out cross validation ES prediction accuracy and corresponding area under the receiver operating characteristic curve evaluation improve from 73.0\% and 0.676 to 78.0\% and 0.704 by 10$\times$ synthetic sample augmentation, respectively. The obtained results indicate that synthetic preictal samples are effective for enhancing ES prediction performance.  ( 2 min )
    Over-the-Air Federated Multi-Task Learning Over MIMO Multiple Access Channels. (arXiv:2112.13603v2 [eess.SP] UPDATED)
    With the explosive growth of data and wireless devices, federated learning (FL) over wireless medium has emerged as a promising technology for large-scale distributed intelligent systems. Yet, the urgent demand for ubiquitous intelligence will generate a large number of concurrent FL tasks, which may seriously aggravate the scarcity of communication resources. By exploiting the analog superposition of electromagnetic waves, over-the-air computation (AirComp) is an appealing solution to alleviate the burden of communication required by FL. However, sharing frequency-time resources in over-the-air computation inevitably brings about the problem of inter-task interference, which poses a new challenge that needs to be appropriately addressed. In this paper, we study over-the-air federated multi-task learning (OA-FMTL) over the multiple-input multiple-output (MIMO) multiple access (MAC) channel. We propose a novel model aggregation method for the alignment of local gradients of different devices, which alleviates the straggler problem in over-the-air computation due to the channel heterogeneity. We establish a communication-learning analysis framework for the proposed OA-FMTL scheme by considering the spatial correlation between devices, and formulate an optimization problem for the design of transceiver beamforming and device selection. To solve this problem, we develop an algorithm by using alternating optimization (AO) and fractional programming (FP), which effectively mitigates the impact of inter-task interference on the FL learning performance. We show that due to the use of the new model aggregation method, device selection is no longer essential, thereby avoiding the heavy computational burden involved in selecting active devices. Numerical results demonstrate the validity of the analysis and the superb performance of the proposed scheme.  ( 2 min )
    Transferring Chemical and Energetic Knowledge Between Molecular Systems with Machine Learning. (arXiv:2205.03339v1 [physics.chem-ph])
    Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has use cases in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms has impacted on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, possessing a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy states. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing the potential energy of a conformation, and (ii) novel message passing and pooling layers for processing and making predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the very same transfer learning approach can be used to group, in an unsupervised way, various secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.  ( 2 min )
    Optimally tackling covariate shift in RKHS-based nonparametric regression. (arXiv:2205.02986v1 [math.ST])
    We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of the likelihood ratio apart from an upper bound on it. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a na\"\i ve estimator, which minimizes the empirical risk over the function class, is strictly suboptimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where likelihood ratio is possibly unbounded yet has a finite second moment. Here, we show via careful simulations that KRR fails to attain the optimal rate. Instead, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax optimal, up to logarithmic factors.  ( 2 min )
    Benchmarking Econometric and Machine Learning Methodologies in Nowcasting. (arXiv:2205.03318v1 [stat.ML])
    Nowcasting can play a key role in giving policymakers timelier insight to data published with a significant time lag, such as final GDP figures. Currently, there are a plethora of methodologies and approaches for practitioners to choose from. However, there lacks a comprehensive comparison of these disparate approaches in terms of predictive performance and characteristics. This paper addresses that deficiency by examining the performance of 12 different methodologies in nowcasting US quarterly GDP growth, including all the methods most commonly employed in nowcasting, as well as some of the most popular traditional machine learning approaches. Performance was assessed on three different tumultuous periods in US economic history: the early 1980s recession, the 2008 financial crisis, and the COVID crisis. The two best performing methodologies in the analysis were long short-term memory artificial neural networks (LSTM) and Bayesian vector autoregression (BVAR). To facilitate further application and testing of each of the examined methodologies, an open-source repository containing boilerplate code that can be applied to different datasets is published alongside the paper, available at: github.com/dhopp1/nowcasting_benchmark.
    Perseus: A Simple High-Order Regularization Method for Variational Inequalities. (arXiv:2205.03202v1 [math.OC])
    This paper settles an open and challenging question pertaining to the design of simple high-order regularization methods for solving smooth and monotone variational inequalities (VIs). A VI involves finding $x^\star \in \mathcal{X}$ such that $\langle F(x), x - x^\star\rangle \geq 0$ for all $x \in \mathcal{X}$ and we consider the setting where $F: \mathbb{R}^d \mapsto \mathbb{R}^d$ is smooth with up to $(p-1)^{th}$-order derivatives. For the case of $p = 2$,~\citet{Nesterov-2006-Constrained} extended the cubic regularized Newton's method to VIs with a global rate of $O(\epsilon^{-1})$. \citet{Monteiro-2012-Iteration} proposed another second-order method which achieved an improved rate of $O(\epsilon^{-2/3}\log(1/\epsilon))$, but this method required a nontrivial binary search procedure as an inner loop. High-order methods based on similar binary search procedures have been further developed and shown to achieve a rate of $O(\epsilon^{-2/(p+1)}\log(1/\epsilon))$. However, such search procedure can be computationally prohibitive in practice and the problem of finding a simple high-order regularization methods remains as an open and challenging question in optimization theory. We propose a $p^{th}$-order method which does \textit{not} require any binary search scheme and is guaranteed to converge to a weak solution with a global rate of $O(\epsilon^{-2/(p+1)})$. A version with restarting attains a global linear and local superlinear convergence rate for smooth and strongly monotone VIs. Further, our method achieves a global rate of $O(\epsilon^{-2/p})$ for solving smooth and non-monotone VIs satisfying the Minty condition; moreover, the restarted version again attains a global linear and local superlinear convergence rate if the strong Minty condition holds.
    Out-of-Distribution Detection for Medical Applications: Guidelines for Practical Evaluation. (arXiv:2109.14885v2 [cs.LG] UPDATED)
    Detection of Out-of-Distribution (OOD) samples in real time is a crucial safety check for deployment of machine learning models in the medical field. Despite a growing number of uncertainty quantification techniques, there is a lack of evaluation guidelines on how to select OOD detection methods in practice. This gap impedes implementation of OOD detection methods for real-world applications. Here, we propose a series of practical considerations and tests to choose the best OOD detector for a specific medical dataset. These guidelines are illustrated on a real-life use case of Electronic Health Records (EHR). Our results can serve as a guide for implementation of OOD detection methods in clinical practice, mitigating risks associated with the use of machine learning models in healthcare.
    Tensor Principal Component Analysis in High Dimensional CP Models. (arXiv:2108.04428v3 [stat.ML] UPDATED)
    The CP decomposition for high dimensional non-orthogonal spiked tensors is an important problem with broad applications across many disciplines. However, previous works with theoretical guarantee typically assume restrictive incoherence conditions on the basis vectors for the CP components. In this paper, we propose new computationally efficient composite PCA and concurrent orthogonalization algorithms for tensor CP decomposition with theoretical guarantees under mild incoherence conditions. The composite PCA applies the principal component or singular value decompositions twice, first to a matrix unfolding of the tensor data to obtain singular vectors and then to the matrix folding of the singular vectors obtained in the first step. It can be used as an initialization for any iterative optimization schemes for the tensor CP decomposition. The concurrent orthogonalization algorithm iteratively estimates the basis vector in each mode of the tensor by simultaneously applying projections to the orthogonal complements of the spaces generated by other CP components in other modes. It is designed to improve the alternating least squares estimator and other forms of the high order orthogonal iteration for tensors with low or moderately high CP ranks, and it is guaranteed to converge rapidly when the error of any given initial estimator is bounded by a small constant. Our theoretical investigation provides estimation accuracy and convergence rates for the two proposed algorithms. Both proposed algorithms are applicable to deterministic tensor, its noisy version, and the order-$2K$ covariance tensor of order-$K$ tensor data in a factor model with uncorrelated factors. Our implementations on synthetic data demonstrate significant practical superiority of our approach over existing methods.
    Transformer Embeddings of Irregularly Spaced Events and Their Participants. (arXiv:2201.00044v3 [cs.LG] UPDATED)
    The neural Hawkes process (Mei & Eisner, 2017) is a generative model of irregularly spaced sequences of discrete events. To handle complex domains with many event types, Mei et al. (2020a) further consider a setting in which each event in the sequence updates a deductive database of facts (via domain-specific pattern-matching rules); future events are then conditioned on the database contents. They show how to convert such a symbolic system into a neuro-symbolic continuous-time generative model, in which each database fact and the possible event has a time-varying embedding that is derived from its symbolic provenance. In this paper, we modify both models, replacing their recurrent LSTM-based architectures with flatter attention-based architectures (Vaswani et al., 2017), which are simpler and more parallelizable. This does not appear to hurt our accuracy, which is comparable to or better than that of the original models as well as (where applicable) previous attention-based methods (Zuo et al., 2020; Zhang et al., 2020a).
    Ultra-sensitive Flexible Sponge-Sensor Array for Muscle Activities Detection and Human Limb Motion Recognition. (arXiv:2205.03238v1 [eess.SP])
    Human limb motion tracking and recognition plays an important role in medical rehabilitation training, lower limb assistance, prosthetics design for amputees, feedback control for assistive robots, etc. Lightweight wearable sensors, including inertial sensors, surface electromyography sensors, and flexible strain/pressure, are promising to become the next-generation human motion capture devices. Herein, we present a wireless wearable device consisting of a sixteen-channel flexible sponge-based pressure sensor array to recognize various human lower limb motions by detecting contours on the human skin caused by calf gastrocnemius muscle actions. Each sensing element is a round porous structure of thin carbon nanotube/polydimethylsiloxane nanocomposites with a diameter of 4 mm and thickness of about 400 {\mu}m. Three human subjects were recruited to perform ten different lower limb motions while wearing the developed device. The motion classification result with the support vector machine method shows a macro-recall of about 94.48% for all ten motions tested. This work demonstrates a portable wearable muscle activity detection device with a lower limb motion recognition application, which can be potentially used in assistive robot control, healthcare, sports monitoring, etc.
    R-GCN: The R Could Stand for Random. (arXiv:2203.02424v2 [cs.LG] UPDATED)
    The inception of the Relational Graph Convolutional Network (R-GCN) marked a milestone in the Semantic Web domain as a widely cited method that generalises end-to-end hierarchical representation learning to Knowledge Graphs (KGs). R-GCNs generate representations for nodes of interest by repeatedly aggregating parameterised, relation-specific transformations of their neighbours. However, in this paper, we argue that the the R-GCN's main contribution lies in this "message passing" paradigm, rather than the learned weights. To this end, we introduce the "Random Relational Graph Convolutional Network" (RR-GCN), which leaves all parameters untrained and thus constructs node embeddings by aggregating randomly transformed random representations from neighbours, i.e., with no learned parameters. We empirically show that RR-GCNs can compete with fully trained R-GCNs in both node classification and link prediction settings.
    MPAF: Model Poisoning Attacks to Federated Learning based on Fake Clients. (arXiv:2203.08669v2 [cs.CR] UPDATED)
    Existing model poisoning attacks to federated learning assume that an attacker has access to a large fraction of compromised genuine clients. However, such assumption is not realistic in production federated learning systems that involve millions of clients. In this work, we propose the first Model Poisoning Attack based on Fake clients called MPAF. Specifically, we assume the attacker injects fake clients to a federated learning system and sends carefully crafted fake local model updates to the cloud server during training, such that the learnt global model has low accuracy for many indiscriminate test inputs. Towards this goal, our attack drags the global model towards an attacker-chosen base model that has low accuracy. Specifically, in each round of federated learning, the fake clients craft fake local model updates that point to the base model and scale them up to amplify their impact before sending them to the cloud server. Our experiments show that MPAF can significantly decrease the test accuracy of the global model, even if classical defenses and norm clipping are adopted, highlighting the need for more advanced defenses.
    Let's Go to the Alien Zoo: Introducing an Experimental Framework to Study Usability of Counterfactual Explanations for Machine Learning. (arXiv:2205.03398v1 [cs.HC])
    To foster usefulness and accountability of machine learning (ML), it is essential to explain a model's decisions in addition to evaluating its performance. Accordingly, the field of explainable artificial intelligence (XAI) has resurfaced as a topic of active research, offering approaches to address the "how" and "why" of automated decision-making. Within this domain, counterfactual explanations (CFEs) have gained considerable traction as a psychologically grounded approach to generate post-hoc explanations. To do so, CFEs highlight what changes to a model's input would have changed its prediction in a particular way. However, despite the introduction of numerous CFE approaches, their usability has yet to be thoroughly validated at the human level. Thus, to advance the field of XAI, we introduce the Alien Zoo, an engaging, web-based and game-inspired experimental framework. The Alien Zoo provides the means to evaluate usability of CFEs for gaining new knowledge from an automated system, targeting novice users in a domain-general context. As a proof of concept, we demonstrate the practical efficacy and feasibility of this approach in a user study. Our results suggest that users benefit from receiving CFEs compared to no explanation, both in terms of objective performance in the proposed iterative learning task, and subjective usability. With this work, we aim to equip research groups and practitioners with the means to easily run controlled and well-powered user studies to complement their otherwise often more technology-oriented work. Thus, in the interest of reproducible research, we provide the entire code, together with the underlying models and user data.
    Vehicle management in a modular production context using Deep Q-Learning. (arXiv:2205.03294v1 [cs.LG])
    We investigate the feasibility of deploying Deep-Q based deep reinforcement learning agents to job-shop scheduling problems in the context of modular production facilities, using discrete event simulations for the environment. These environments are comprised of a source and sink for the parts to be processed, as well as (several) workstations. The agents are trained to schedule automated guided vehicles to transport the parts back and forth between those stations in an optimal fashion. Starting from a very simplistic setup, we increase the complexity of the environment and compare the agents' performances with well established heuristic approaches, such as first-in-first-out based agents, cost tables and a nearest-neighbor approach. We furthermore seek particular configurations of the environments in which the heuristic approaches struggle, to investigate to what degree the Deep-Q agents are affected by these challenges. We find that Deep-Q based agents show comparable performance as the heuristic baselines. Furthermore, our findings suggest that the DRL agents exhibit an increased robustness to noise, as compared to the conventional approaches. Overall, we find that DRL agents constitute a valuable approach for this type of scheduling problems.
    Fundamental Performance Limits for Sensor-Based Robot Control and Policy Learning. (arXiv:2202.00129v2 [cs.RO] UPDATED)
    Our goal is to develop theory and algorithms for establishing fundamental limits on performance for a given task imposed by a robot's sensors. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper bound on the highest achievable expected reward for one-step decision making tasks. We then extend this bound to multi-step problems via a dynamic programming approach. We present algorithms for numerically computing the resulting bounds, and demonstrate our approach on three examples: (i) the lava problem from the literature on partially observable Markov decision processes, (ii) an example with continuous state and observation spaces corresponding to a robot catching a freely-falling object, and (iii) obstacle avoidance using a depth sensor with non-Gaussian noise. We demonstrate the ability of our approach to establish strong limits on achievable performance for these problems by comparing our upper bounds with achievable lower bounds (computed by synthesizing or learning concrete control policies).
    DADApy: Distance-based Analysis of DAta-manifolds in Python. (arXiv:2205.03373v1 [cs.LG])
    DADApy is a python software package for analysing and characterising high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in toy cases and in a real-world application. The package is freely available under the open-source Apache 2.0 license and can be downloaded from the Github page https://github.com/sissa-data-science/DADApy.
    Designing Robust Biotechnological Processes Regarding Variabilities using Multi-Objective Optimization Applied to a Biopharmaceutical Seed Train Design. (arXiv:2205.03261v1 [cs.LG])
    Development and optimization of biopharmaceutical production processes with cell cultures is cost- and time-consuming and often performed rather empirically. Efficient optimization of multiple-objectives like process time, viable cell density, number of operating steps & cultivation scales, required medium, amount of product as well as product quality depicts a promising approach. This contribution presents a workflow which couples uncertainty-based upstream simulation and Bayes optimization using Gaussian processes. Its application is demonstrated in a simulation case study for a relevant industrial task in process development, the design of a robust cell culture expansion process (seed train), meaning that despite uncertainties and variabilities concerning cell growth, low variations of viable cell density during the seed train are obtained. Compared to a non-optimized reference seed train, the optimized process showed much lower deviation rates regarding viable cell densities (<~10% instead of 41.7%) using 5 or 4 shake flask scales and seed train duration could be reduced by 56 h from 576 h to 520 h. Overall, it is shown that applying Bayes optimization allows for optimization of a multi-objective optimization function with several optimizable input variables and under a considerable amount of constraints with a low computational effort. This approach provides the potential to be used in form of a decision tool, e.g. for the choice of an optimal and robust seed train design or for further optimization tasks within process development.  ( 2 min )
    PTFlash: A deep learning framework for isothermal two-phase equilibrium calculations. (arXiv:2205.03090v1 [physics.chem-ph])
    Phase equilibrium calculations are an essential part of numerical simulations of multi-component multi-phase flow in porous media, accounting for the largest share of the computational time. In this work, we introduce a GPUenabled, fast, and parallel framework, PTFlash, that vectorizes algorithms required for isothermal two-phase flash calculations using PyTorch, and can facilitate a wide range of downstream applications. In addition, to further accelerate PTFlash, we design two task-specific neural networks, one for predicting the stability of given mixtures and the other for providing estimates of the distribution coefficients, which are trained offline and help shorten computation time by sidestepping stability analysis and reducing the number of iterations to reach convergence. The evaluation of PTFlash was conducted on three case studies involving hydrocarbons, CO$_2$ and N$_2$ , for which the phase equilibrium was tested over a large range of temperature, pressure and composition conditions, using the Soave-Redlich-Kwong (SRK) equation of state. We compare PTFlash with an in-house thermodynamic library, Carnot, written in C++ and performing flash calculations one by one on CPU. Results show speed-ups on large scale calculations up to two order of magnitudes, while maintaining perfect precision with the reference solution provided by Carnot.  ( 2 min )
    On boundary conditions parametrized by analytic functions. (arXiv:2205.03185v1 [cs.LG])
    Computer algebra can answer various questions about partial differential equations using symbolic algorithms. However, the inclusion of data into equations is rare in computer algebra. Therefore, recently, computer algebra models have been combined with Gaussian processes, a regression model in machine learning, to describe the behavior of certain differential equations under data. While it was possible to describe polynomial boundary conditions in this context, we extend these models to analytic boundary conditions. Additionally, we describe the necessary algorithms for Gr\"obner and Janet bases of Weyl algebras with certain analytic coefficients. Using these algorithms, we provide examples of divergence-free flow in domains bounded by analytic functions and adapted to observations.  ( 2 min )
    Federated Learning with Noisy User Feedback. (arXiv:2205.03092v1 [cs.LG])
    Machine Learning (ML) systems are getting increasingly popular, and drive more and more applications and services in our daily life. This has led to growing concerns over user privacy, since human interaction data typically needs to be transmitted to the cloud in order to train and improve such systems. Federated learning (FL) has recently emerged as a method for training ML models on edge devices using sensitive user data and is seen as a way to mitigate concerns over data privacy. However, since ML models are most commonly trained with label supervision, we need a way to extract labels on edge to make FL viable. In this work, we propose a strategy for training FL models using positive and negative user feedback. We also design a novel framework to study different noise patterns in user feedback, and explore how well standard noise-robust objectives can help mitigate this noise when training models in a federated setting. We evaluate our proposed training setup through detailed experiments on two text classification datasets and analyze the effects of varying levels of user reliability and feedback noise on model performance. We show that our method improves substantially over a self-training baseline, achieving performance closer to models trained with full supervision.  ( 2 min )
    Investigating and Explaining the Frequency Bias in Image Classification. (arXiv:2205.03154v1 [cs.CV])
    CNNs exhibit many behaviors different from humans, one of which is the capability of employing high-frequency components. This paper discusses the frequency bias phenomenon in image classification tasks: the high-frequency components are actually much less exploited than the low- and mid-frequency components. We first investigate the frequency bias phenomenon by presenting two observations on feature discrimination and learning priority. Furthermore, we hypothesize that (i) the spectral density, (ii) class consistency directly affect the frequency bias. Specifically, our investigations verify that the spectral density of datasets mainly affects the learning priority, while the class consistency mainly affects the feature discrimination.  ( 2 min )
    TTRS: Tinkoff Transactions Recommender System benchmark. (arXiv:2110.05589v2 [cs.LG] UPDATED)
    Over the past decade, tremendous progress has been made in inventing new RecSys methods. However, one of the fundamental problems of the RecSys research community remains the lack of applied datasets and benchmarks with well-defined evaluation rules and metrics to test these novel approaches. In this article, we present the TTRS - Tinkoff Transactions Recommender System benchmark. This financial transaction benchmark contains over 2 million interactions between almost 10,000 users and more than 1,000 merchant brands over 14 months. To the best of our knowledge, this is the first publicly available financial transactions dataset. To make it more suitable for possible applications, we provide a complete description of the data collection pipeline, its preprocessing, and the resulting dataset statistics. We also present a comprehensive comparison of the current popular RecSys methods on the next-period recommendation task and conduct a detailed analysis of their performance against various metrics and datasets.  ( 2 min )
    Building a 3-Player Mahjong AI using Deep Reinforcement Learning. (arXiv:2202.12847v2 [cs.AI] UPDATED)
    Mahjong is a popular multi-player imperfect-information game developed in China in the late 19th-century, with some very challenging features for AI research. Sanma, being a 3-player variant of the Japanese Riichi Mahjong, possesses unique characteristics including fewer tiles and, consequently, a more aggressive playing style. It is thus challenging and of great research interest in its own right, but has not yet been explored. In this paper, we present Meowjong, an AI for Sanma using deep reinforcement learning. We define an informative and compact 2-dimensional data structure for encoding the observable information in a Sanma game. We pre-train 5 convolutional neural networks (CNNs) for Sanma's 5 actions -- discard, Pon, Kan, Kita and Riichi, and enhance the major action's model, namely the discard model, via self-play reinforcement learning using the Monte Carlo policy gradient method. Meowjong's models achieve test accuracies comparable with AIs for 4-player Mahjong through supervised learning, and gain a significant further enhancement from reinforcement learning. Being the first ever AI in Sanma, we claim that Meowjong stands as a state-of-the-art in this game.  ( 2 min )
    Towards QD-suite: developing a set of benchmarks for Quality-Diversity algorithms. (arXiv:2205.03207v1 [cs.LG])
    While the field of Quality-Diversity (QD) has grown into a distinct branch of stochastic optimization, a few problems, in particular locomotion and navigation tasks, have become de facto standards. Are such benchmarks sufficient? Are they representative of the key challenges faced by QD algorithms? Do they provide the ability to focus on one particular challenge by properly disentangling it from others? Do they have much predictive power in terms of scalability and generalization? Existing benchmarks are not standardized, and there is currently no MNIST equivalent for QD. Inspired by recent works on Reinforcement Learning benchmarks, we argue that the identification of challenges faced by QD methods and the development of targeted, challenging, scalable but affordable benchmarks is an important step. As an initial effort, we identify three problems that are challenging in sparse reward settings, and propose associated benchmarks: (1) Behavior metric bias, which can result from the use of metrics that do not match the structure of the behavior space. (2) Behavioral Plateaus, with varying characteristics, such that escaping them would require adaptive QD algorithms and (3) Evolvability Traps, where small variations in genotype result in large behavioral changes. The environments that we propose satisfy the properties listed above.  ( 2 min )
    Remote Blood Oxygen Estimation From Videos Using Neural Networks. (arXiv:2107.05087v2 [cs.LG] UPDATED)
    Blood oxygen saturation (SpO$_2$) is an essential indicator of respiratory functionality and is receiving increasing attention during the COVID-19 pandemic. Clinical findings show that it is possible for COVID-19 patients to have significantly low SpO$_2$ before any obvious symptoms. The prevalence of cameras has motivated researchers to investigate methods for monitoring SpO$_2$ using videos. Most prior schemes involving smartphones are contact-based: They require a fingertip to cover the phone's camera and the nearby light source to capture re-emitted light from the illuminated tissue. In this paper, we propose the first convolutional neural network based noncontact SpO$_2$ estimation scheme using smartphone cameras. The scheme analyzes the videos of a participant's hand for physiological sensing, which is convenient and comfortable, and can protect their privacy and allow for keeping face masks on. We design our neural network architectures inspired by the optophysiological models for SpO$_2$ measurement and demonstrate the explainability by visualizing the weights for channel combination. Our proposed models outperform the state-of-the-art model that is designed for contact-based SpO$_2$ measurement, showing the potential of our proposed method to contribute to public health. We also analyze the impact of skin type and the side of a hand on SpO$_2$ estimation performance.  ( 2 min )
    TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data. (arXiv:2111.04798v3 [cs.LG] UPDATED)
    Machine learning practitioners often have access to a spectrum of data: labeled data for the target task (which is often limited), unlabeled data, and auxiliary data, the many available labeled datasets for other tasks. We describe TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers. The key components of TAGLETS are: (1) auxiliary data organized according to a knowledge graph, (2) modules encapsulating different methods for exploiting auxiliary and unlabeled data, and (3) a distillation stage in which the ensembled modules are combined into a servable model. We compare TAGLETS with state-of-the-art transfer learning and semi-supervised learning methods on four image classification tasks. Our study covers a range of settings, varying the amount of labeled data and the semantic relatedness of the auxiliary data to the target task. We find that the intelligent incorporation of auxiliary and unlabeled data into multiple learning techniques enables TAGLETS to match-and most often significantly surpass-these alternatives. TAGLETS is available as an open-source system at github.com/BatsResearch/taglets.  ( 2 min )
    Optimal quantum dataset for learning a unitary transformation. (arXiv:2203.00546v2 [quant-ph] UPDATED)
    Unitary transformations formulate the time evolution of quantum states. How to learn a unitary transformation efficiently is a fundamental problem in quantum machine learning. The most natural and leading strategy is to train a quantum machine learning model based on a quantum dataset. Although presence of more training data results in better models, using too much data reduces the efficiency of training. In this work, we solve the problem on the minimum size of sufficient quantum datasets for learning a unitary transformation exactly, which reveals the power and limitation of quantum data. First, we prove that the minimum size of dataset with pure states is $2^n$ for learning an $n$-qubit unitary transformation. To fully explore the capability of quantum data, we introduce a practical quantum dataset consisting of $n+1$ elementary tensor product states that are sufficient for exact training. The main idea is to simplify the structure utilizing decoupling, which leads to an exponential improvement on the size over the datasets with pure states. Furthermore, we show that the size of quantum dataset with mixed states can be reduced to a constant, which yields an optimal quantum dataset for learning a unitary. We showcase the applications of our results in oracle compiling and Hamiltonian simulation. Notably, to accurately simulate a 3-qubit one-dimensional nearest-neighbor Heisenberg model, our circuit only uses $48$ elementary quantum gates, which is significantly less than $4320$ gates in the circuit constructed by the Trotter-Suzuki product formula.  ( 2 min )
    LPGNet: Link Private Graph Networks for Node Classification. (arXiv:2205.03105v1 [cs.LG])
    Classification tasks on labeled graph-structured data have many important applications ranging from social recommendation to financial modeling. Deep neural networks are increasingly being used for node classification on graphs, wherein nodes with similar features have to be given the same label. Graph convolutional networks (GCNs) are one such widely studied neural network architecture that perform well on this task. However, powerful link-stealing attacks on GCNs have recently shown that even with black-box access to the trained model, inferring which links (or edges) are present in the training graph is practical. In this paper, we present a new neural network architecture called LPGNet for training on graphs with privacy-sensitive edges. LPGNet provides differential privacy (DP) guarantees for edges using a novel design for how graph edge structure is used during training. We empirically show that LPGNet models often lie in the sweet spot between providing privacy and utility: They can offer better utility than "trivially" private architectures which use no edge information (e.g., vanilla MLPs) and better resilience against existing link-stealing attacks than vanilla GCNs which use the full edge structure. LPGNet also offers consistently better privacy-utility tradeoffs than DPGCN, which is the state-of-the-art mechanism for retrofitting differential privacy into conventional GCNs, in most of our evaluated datasets.  ( 2 min )
    Controlled Dropout for Uncertainty Estimation. (arXiv:2205.03109v1 [cs.LG])
    Uncertainty quantification in a neural network is one of the most discussed topics for safety-critical applications. Though Neural Networks (NNs) have achieved state-of-the-art performance for many applications, they still provide unreliable point predictions, which lack information about uncertainty estimates. Among various methods to enable neural networks to estimate uncertainty, Monte Carlo (MC) dropout has gained much popularity in a short period due to its simplicity. In this study, we present a new version of the traditional dropout layer where we are able to fix the number of dropout configurations. As such, each layer can take and apply the new dropout layer in the MC method to quantify the uncertainty associated with NN predictions. We conduct experiments on both toy and realistic datasets and compare the results with the MC method using the traditional dropout layer. Performance analysis utilizing uncertainty evaluation metrics corroborates that our dropout layer offers better performance in most cases.  ( 2 min )
    Newton-MR: Inexact Newton Method With Minimum Residual Sub-problem Solver. (arXiv:1810.00303v4 [math.OC] UPDATED)
    We consider a variant of inexact Newton Method, called Newton-MR, in which the least-squares sub-problems are solved approximately using Minimum Residual method. By construction, Newton-MR can be readily applied for unconstrained optimization of a class of non-convex problems known as invex, which subsumes convexity as a sub-class. For invex optimization, instead of the classical Lipschitz continuity assumptions on gradient and Hessian, Newton-MR's global convergence can be guaranteed under a weaker notion of joint regularity of Hessian and gradient. We also obtain Newton-MR's problem-independent local convergence to the set of minima. We show that fast local/global convergence can be guaranteed under a novel inexactness condition, which, to our knowledge, is much weaker than the prior related works. Numerical results demonstrate the performance of Newton-MR as compared with several other Newton-type alternatives on a few machine learning problems.  ( 2 min )
    How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation. (arXiv:2205.03353v1 [cs.RO])
    Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy we might have access to, for example from training on related prior tasks, or in simulation. To this end, we develop two RL algorithms that can speed up training by using not only the action distributions of teacher policies, but also data collected by such policies on the task at hand. We conduct a thorough experimental study of how to use suboptimal teachers on a challenging robotic manipulation benchmark on vision-based stacking with diverse objects. We compare our methods to offline, online, offline-to-online, and kickstarting RL algorithms. By doing so, we find that training on data from both the teacher and student, enables the best performance for limited data budgets. We examine how to best allocate a limited data budget -- on the target task -- between the teacher and the student policy, and report experiments using varying budgets, two teachers with different degrees of suboptimality, and five stacking tasks that require a diverse set of behaviors. Our analysis, both in simulation and in the real world, shows that our approach is the best across data budgets, while standard offline RL from teacher rollouts is surprisingly effective when enough data is given.  ( 2 min )
    HumanAL: Calibrating Human Matching Beyond a Single Task. (arXiv:2205.03209v1 [cs.DB])
    This work offers a novel view on the use of human input as labels, acknowledging that humans may err. We build a behavioral profile for human annotators which is used as a feature representation of the provided input. We show that by utilizing black-box machine learning, we can take into account human behavior and calibrate their input to improve the labeling quality. To support our claims and provide a proof-of-concept, we experiment with three different matching tasks, namely, schema matching, entity matching and text matching. Our empirical evaluation suggests that the method can improve the quality of gathered labels in multiple settings including cross-domain (across different matching tasks).  ( 2 min )
    Beyond backpropagation: implicit gradients for bilevel optimization. (arXiv:2205.03076v1 [cs.LG])
    This paper reviews gradient-based techniques to solve bilevel optimization problems. Bilevel optimization is a general way to frame the learning of systems that are implicitly defined through a quantity that they minimize. This characterization can be applied to neural networks, optimizers, algorithmic solvers and even physical systems, and allows for greater modeling flexibility compared to an explicit definition of such systems. Here we focus on gradient-based approaches that solve such problems. We distinguish them in two categories: those rooted in implicit differentiation, and those that leverage the equilibrium propagation theorem. We present the mathematical foundations that are behind such methods, introduce the gradient-estimation algorithms in detail and compare the competitive advantages of the different approaches.  ( 2 min )
    Fast Rate Generalization Error Bounds: Variations on a Theme. (arXiv:2205.03131v1 [cs.IT])
    A recent line of works, initiated by \cite{russo2016controlling} and \cite{xu2017information}, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(\sqrt{\lambda/{n}}) where \lambda is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate (O(1/n)) result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the (\eta,c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(\lambda/{n}) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.  ( 2 min )
    Characterizing TMS-EEG perturbation indexes using signal energy: initial study on Alzheimer's Disease classification. (arXiv:2205.03241v1 [eess.SP])
    Transcranial Magnetic Stimulation (TMS) combined with EEG recordings (TMS-EEG) has shown great potential in the study of the brain and in particular of Alzheimer's Disease (AD). In this study, we propose an automatic method of determining the duration of TMS induced perturbation of the EEG signal as a potential metric reflecting the brain's functional alterations. A preliminary study is conducted in patients with Alzheimer's disease (AD). Three metrics for characterizing the strength and duration of TMS evoked EEG (TEP) activity are proposed and their potential in identifying AD patients from healthy controls was investigated. A dataset of TMS-EEG recordings from 17 AD and 17 healthy controls (HC) was used in our analysis. A Random Forest classification algorithm was trained on the extracted TEP metrics and its performance is evaluated in a leave-one-subject-out cross-validation. The created model showed promising results in identifying AD patients from HC with an accuracy, sensitivity and specificity of 69.32%, 72.23% and 66.41%, respectively.  ( 2 min )
    Detection of Propaganda Techniques in Visuo-Lingual Metaphor in Memes. (arXiv:2205.02937v1 [cs.CV])
    The exponential rise of social media networks has allowed the production, distribution, and consumption of data at a phenomenal rate. Moreover, the social media revolution has brought a unique phenomenon to social media platforms called Internet memes. Internet memes are one of the most popular contents used on social media, and they can be in the form of images with a witty, catchy, or satirical text description. In this paper, we are dealing with propaganda that is often seen in Internet memes in recent times. Propaganda is communication, which frequently includes psychological and rhetorical techniques to manipulate or influence an audience to act or respond as the propagandist wants. To detect propaganda in Internet memes, we propose a multimodal deep learning fusion system that fuses the text and image feature representations and outperforms individual models based solely on either text or image modalities.  ( 2 min )
    Emp-RFT: Empathetic Response Generation via Recognizing Feature Transitions between Utterances. (arXiv:2205.03112v1 [cs.CL])
    Each utterance in multi-turn empathetic dialogues has features such as emotion, keywords, and utterance-level meaning. Feature transitions between utterances occur naturally. However, existing approaches fail to perceive the transitions because they extract features for the context at the coarse-grained level. To solve the above issue, we propose a novel approach of recognizing feature transitions between utterances, which helps understand the dialogue flow and better grasp the features of utterance that needs attention. Also, we introduce a response generation strategy to help focus on emotion and keywords related to appropriate features when generating responses. Experimental results show that our approach outperforms baselines and especially, achieves significant improvements on multi-turn dialogues.  ( 2 min )
    Scalable computation of prediction intervals for neural networks via matrix sketching. (arXiv:2205.03194v1 [stat.ML])
    Accounting for the uncertainty in the predictions of modern neural networks is a challenging and important task in many domains. Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure (e.g., Bayesian neural networks) or dramatically increase the computational cost of predictions such as approaches based on ensembling. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals. The method is based on the classical delta method in statistics but achieves computational efficiency by using matrix sketching to approximate the Jacobian matrix. The resulting algorithm is competitive with state-of-the-art approaches for constructing predictive intervals on various regression datasets from the UCI repository.  ( 2 min )
    Belief propagation for permutations, rankings, and partial orders. (arXiv:2110.00513v2 [cs.AI] UPDATED)
    Many datasets give partial information about an ordering or ranking by indicating which team won a game, which item a user prefers, or who infected whom. We define a continuous spin system whose Gibbs distribution is the posterior distribution on permutations, given a probabilistic model of these interactions. Using the cavity method we derive a belief propagation algorithm that computes the marginal distribution of each node's position. In addition, the Bethe free energy lets us approximate the number of linear extensions of a partial order and perform model selection between competing probabilistic models, such as the Bradley-Terry-Luce model of noisy comparisons and its cousins.  ( 2 min )
    Learning Optimal Propagation for Graph Neural Networks. (arXiv:2205.02998v1 [cs.LG])
    Graph Neural Networks (GNNs) have achieved tremendous success in a variety of real-world applications by relying on the fixed graph data as input. However, the initial input graph might not be optimal in terms of specific downstream tasks, because of information scarcity, noise, adversarial attacks, or discrepancies between the distribution in graph topology, features, and groundtruth labels. In this paper, we propose a bi-level optimization-based approach for learning the optimal graph structure via directly learning the Personalized PageRank propagation matrix as well as the downstream semi-supervised node classification simultaneously. We also explore a low-rank approximation model for further reducing the time complexity. Empirical evaluations show the superior efficacy and robustness of the proposed model over all baseline methods.  ( 2 min )
    Explaining the Effectiveness of Multi-Task Learning for Efficient Knowledge Extraction from Spine MRI Reports. (arXiv:2205.02979v1 [cs.LG])
    Pretrained Transformer based models finetuned on domain specific corpora have changed the landscape of NLP. However, training or fine-tuning these models for individual tasks can be time consuming and resource intensive. Thus, a lot of current research is focused on using transformers for multi-task learning (Raffel et al.,2020) and how to group the tasks to help a multi-task model to learn effective representations that can be shared across tasks (Standley et al., 2020; Fifty et al., 2021). In this work, we show that a single multi-tasking model can match the performance of task specific models when the task specific models show similar representations across all of their hidden layers and their gradients are aligned, i.e. their gradients follow the same direction. We hypothesize that the above observations explain the effectiveness of multi-task learning. We validate our observations on our internal radiologist-annotated datasets on the cervical and lumbar spine. Our method is simple and intuitive, and can be used in a wide range of NLP problems.  ( 2 min )
    Design Target Achievement Index: A Differentiable Metric to Enhance Deep Generative Models in Multi-Objective Inverse Design. (arXiv:2205.03005v1 [cs.LG])
    Deep Generative Machine Learning Models have been growing in popularity across the design community thanks to their ability to learn and mimic complex data distributions. While early works are promising, further advancement will depend on addressing several critical considerations such as design quality, feasibility, novelty, and targeted inverse design. We propose the Design Target Achievement Index (DTAI), a differentiable, tunable metric that scores a design's ability to achieve designer-specified minimum performance targets. We demonstrate that DTAI can drastically improve the performance of generated designs when directly used as a training loss in Deep Generative Models. We apply the DTAI loss to a Performance-Augmented Diverse GAN (PaDGAN) and demonstrate superior generative performance compared to a set of baseline Deep Generative Models including a Multi-Objective PaDGAN and specialized tabular generation algorithms like the Conditional Tabular GAN (CTGAN). We further enhance PaDGAN with an auxiliary feasibility classifier to encourage feasible designs. To evaluate methods, we propose a comprehensive set of evaluation metrics for generative methods that focus on feasibility, diversity, and satisfaction of design performance targets. Methods are tested on a challenging benchmarking problem: the FRAMED bicycle frame design dataset featuring mixed-datatype parametric data, heavily skewed and multimodal distributions, and ten competing performance objectives.  ( 2 min )
    A CNN Approach for 5G mmWave Positioning Using Beamformed CSI Measurements. (arXiv:2205.03236v1 [eess.SP])
    The advent of Artificial Intelligence (AI) has impacted all aspects of human life. One of the concrete examples of AI impact is visible in radio positioning. In this article, for the first time we utilize the power of AI by training a Convolutional Neural Network (CNN) using 5G New Radio (NR) fingerprints consisting of beamformed Channel State Information (CSI). By observing CSI, it is possible to characterize the multipath channel between the transmitter and the receiver, and thus provide a good source of spatiotemporal data to find the position of a User Equipment (UE). We collect ray-tracing-based 5G NR CSI from an urban area. The CSI data of the signals from one Base Station (BS) is collected at the reference points with known positions to train a CNN. We evaluate our work by testing: a) the robustness of the trained network for estimating the positions for the new measurements on the same reference points and b) the accuracy of the CNN-based position estimation while the UE is on points other than the reference points. The results prove that our trained network for a specific urban environment can estimate the UE position with a minimum mean error of 0.98 m.  ( 2 min )
    Dynamic Sparse Training for Deep Reinforcement Learning. (arXiv:2106.04217v3 [cs.LG] UPDATED)
    Deep reinforcement learning (DRL) agents are trained through trial-and-error interactions with the environment. This leads to a long training time for dense neural networks to achieve good performance. Hence, prohibitive computation and memory resources are consumed. Recently, learning efficient DRL agents has received increasing attention. Yet, current methods focus on accelerating inference time. In this paper, we introduce for the first time a dynamic sparse training approach for deep reinforcement learning to accelerate the training process. The proposed approach trains a sparse neural network from scratch and dynamically adapts its topology to the changing data distribution during training. Experiments on continuous control tasks show that our dynamic sparse agents achieve higher performance than the equivalent dense methods, reduce the parameter count and floating-point operations (FLOPs) by 50%, and have a faster learning speed that enables reaching the performance of dense agents with 40-50% reduction in the training steps.  ( 2 min )
    Green Accelerated Hoeffding Tree. (arXiv:2205.03184v1 [cs.LG])
    State-of-the-art machine learning solutions mainly focus on creating highly accurate models without constraints on hardware resources. Stream mining algorithms are designed to run on resource-constrained devices, thus a focus on low power and energy and memory-efficient is essential. The Hoeffding tree algorithm is able to create energy-efficient models, but at the cost of less accurate trees in comparison to their ensembles counterpart. Ensembles of Hoeffding trees, on the other hand, create a highly accurate forest of trees but consume five times more energy on average. An extension that tried to obtain similar results to ensembles of Hoeffding trees was the Extremely Fast Decision Tree (EFDT). This paper presents the Green Accelerated Hoeffding Tree (GAHT) algorithm, an extension of the EFDT algorithm with a lower energy and memory footprint and the same (or higher for some datasets) accuracy levels. GAHT grows the tree setting individual splitting criteria for each node, based on the distribution of the number of instances over each particular leaf. The results show that GAHT is able to achieve the same competitive accuracy results compared to EFDT and ensembles of Hoeffding trees while reducing the energy consumption up to 70%.  ( 2 min )
    A Deep Bayesian Bandits Approach for Anticancer Therapy: Exploration via Functional Prior. (arXiv:2205.02944v1 [cs.LG])
    Learning personalized cancer treatment with machine learning holds great promise to improve cancer patients' chance of survival. Despite recent advances in machine learning and precision oncology, this approach remains challenging as collecting data in preclinical/clinical studies for modeling multiple treatment efficacies is often an expensive, time-consuming process. Moreover, the randomization in treatment allocation proves to be suboptimal since some participants/samples are not receiving the most appropriate treatments during the trial. To address this challenge, we formulate drug screening study as a "contextual bandit" problem, in which an algorithm selects anticancer therapeutics based on contextual information about cancer cell lines while adapting its treatment strategy to maximize treatment response in an "online" fashion. We propose using a novel deep Bayesian bandits framework that uses functional prior to approximate posterior for drug response prediction based on multi-modal information consisting of genomic features and drug structure. We empirically evaluate our method on three large-scale in vitro pharmacogenomic datasets and show that our approach outperforms several benchmarks in identifying optimal treatment for a given cell line.  ( 2 min )
    FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks. (arXiv:2104.08815v3 [cs.CL] UPDATED)
    Increasing concerns and regulations about data privacy and sparsity necessitate the study of privacy-preserving, decentralized learning methods for natural language processing (NLP) tasks. Federated learning (FL) provides promising approaches for a large number of clients (e.g., personal devices or organizations) to collaboratively learn a shared global model to benefit all clients while allowing users to keep their data locally. Despite interest in studying FL methods for NLP tasks, a systematic comparison and analysis is lacking in the literature. Herein, we present the FedNLP, a benchmarking framework for evaluating federated learning methods on four different task formulations: text classification, sequence tagging, question answering, and seq2seq. We propose a universal interface between Transformer-based language models (e.g., BERT, BART) and FL methods (e.g., FedAvg, FedOPT, etc.) under various non-IID partitioning strategies. Our extensive experiments with FedNLP provide empirical comparisons between FL methods and helps us better understand the inherent challenges of this direction. The comprehensive analysis points to intriguing and exciting future research aimed at developing FL methods for NLP tasks.  ( 2 min )
    Probabilistic learning constrained by realizations using a weak formulation of Fourier transform of probability measures. (arXiv:2205.03078v1 [stat.ML])
    This paper deals with the taking into account a given set of realizations as constraints in the Kullback-Leibler minimum principle, which is used as a probabilistic learning algorithm. This permits the effective integration of data into predictive models. We consider the probabilistic learning of a random vector that is made up of either a quantity of interest (unsupervised case) or the couple of the quantity of interest and a control parameter (supervised case). A training set of independent realizations of this random vector is assumed to be given and to be generated with a prior probability measure that is unknown. A target set of realizations of the QoI is available for the two considered cases. The framework is the one of non-Gaussian problems in high dimension. A functional approach is developed on the basis of a weak formulation of the Fourier transform of probability measures (characteristic functions). The construction makes it possible to take into account the target set of realizations of the QoI in the Kullback-Leibler minimum principle. The proposed approach allows for estimating the posterior probability measure of the QoI (unsupervised case) or of the posterior joint probability measure of the QoI with the control parameter (supervised case). The existence and the uniqueness of the posterior probability measure is analyzed for the two cases. The numerical aspects are detailed in order to facilitate the implementation of the proposed method. The presented application in high dimension demonstrates the efficiency and the robustness of the proposed algorithm.  ( 2 min )
    Imperceptible Backdoor Attack: From Input Space to Feature Representation. (arXiv:2205.03190v1 [cs.CR])
    Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs). In the backdoor attack scenario, attackers usually implant the backdoor into the target model by manipulating the training dataset or training process. Then, the compromised model behaves normally for benign input yet makes mistakes when the pre-defined trigger appears. In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack. We treat the trigger pattern as a special kind of noise following a multinomial distribution. A U-net-based network is employed to generate concrete parameters of multinomial distribution for each benign input. This elaborated trigger ensures that our approach is invisible to both humans and statistical detection. Besides the design of the trigger, we also consider the robustness of our approach against model diagnose-based defences. We force the feature representation of malicious input stamped with the trigger to be entangled with the benign one. We demonstrate the effectiveness and robustness against multiple state-of-the-art defences through extensive datasets and networks. Our trigger only modifies less than 1\% pixels of a benign image while the modification magnitude is 1. Our source code is available at https://github.com/Ekko-zn/IJCAI2022-Backdoor.  ( 2 min )
    The NT-Xent loss upper bound. (arXiv:2205.03169v1 [cs.LG])
    Self-supervised learning is a growing paradigm in deep representation learning, showing great generalization capabilities and competitive performance in low-labeled data regimes. The SimCLR framework proposes the NT-Xent loss for contrastive representation learning. The objective of the loss function is to maximize agreement, similarity, between sampled positive pairs. This short paper derives and proposes an upper bound for the loss and average similarity. An analysis of the implications is however not provided, but we strongly encourage anyone in the field to conduct this.  ( 2 min )
    UAV-aided RF Mapping for Sensing and Connectivity in Wireless Networks. (arXiv:2205.03335v1 [cs.IT])
    The use of unmanned aerial vehicles (UAV) as flying radio access network (RAN) nodes offers a promising complement to traditional fixed terrestrial deployments. More recently yet still in the context of wireless networks, drones have also been envisioned for use as radio frequency (RF) sensing and localization devices. In both cases, the advantage of using UAVs lies in their ability to navigate themselves freely in 3D and in a timely manner to locations of space where the obtained network throughput or sensing performance is optimal. In practice, the selection of a proper location or trajectory for the UAV very much depends on local terrain features, including the position of surrounding radio obstacles. Hence, the robot must be able to map the features of its radio environment as it performs its data communication or sensing services. The challenges related to this task, referred here as radio mapping, are discussed in this paper. Its promises related to efficient trajectory design for autonomous radio-aware UAVs are highlighted, along with algorithm solutions. The advantages induced by radio-mapping in terms of connectivity, sensing, and localization performance are illustrated.  ( 2 min )
    Immiscible Color Flows in Optimal Transport Networks for Image Classification. (arXiv:2205.02938v1 [cs.CV])
    In classification tasks, it is crucial to meaningfully exploit information contained in data. Here, we propose a physics-inspired dynamical system that adapts Optimal Transport principles to effectively leverage color distributions of images. Our dynamics regulates immiscible fluxes of colors traveling on a network built from images. Instead of aggregating colors together, it treats them as different commodities that interact with a shared capacity on edges. Our method outperforms competitor algorithms on image classification tasks in datasets where color information matters.  ( 2 min )
    Crop Type Identification for Smallholding Farms: Analyzing Spatial, Temporal and Spectral Resolutions in Satellite Imagery. (arXiv:2205.03104v1 [cs.CV])
    The integration of the modern Machine Learning (ML) models into remote sensing and agriculture has expanded the scope of the application of satellite images in the agriculture domain. In this paper, we present how the accuracy of crop type identification improves as we move from medium-spatiotemporal-resolution (MSTR) to high-spatiotemporal-resolution (HSTR) satellite images. We further demonstrate that high spectral resolution in satellite imagery can improve prediction performance for low spatial and temporal resolutions (LSTR) images. The F1-score is increased by 7% when using multispectral data of MSTR images as compared to the best results obtained from HSTR images. Similarly, when crop season based time series of multispectral data is used we observe an increase of 1.2% in the F1-score. The outcome motivates further advancements in the field of synthetic band generation.  ( 2 min )
    Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization. (arXiv:2205.03059v1 [cs.LG])
    Nonconvex regularization has been popularly used in low-rank matrix learning. However, extending it for low-rank tensor learning is still computationally expensive. To address this problem, we develop an efficient solver for use with a nonconvex extension of the overlapped nuclear norm regularizer. Based on the proximal average algorithm, the proposed algorithm can avoid expensive tensor folding/unfolding operations. A special "sparse plus low-rank" structure is maintained throughout the iterations, and allows fast computation of the individual proximal steps. Empirical convergence is further improved with the use of adaptive momentum. We provide convergence guarantees to critical points on smooth losses and also on objectives satisfying the Kurdyka-{\L}ojasiewicz condition. While the optimization problem is nonconvex and nonsmooth, we show that its critical points still have good statistical performance on the tensor completion problem. Experiments on various synthetic and real-world data sets show that the proposed algorithm is efficient in both time and space and more accurate than the existing state-of-the-art.  ( 2 min )
    GANs as Gradient Flows that Converge. (arXiv:2205.02910v1 [cs.LG])
    This paper approaches the unsupervised learning problem by gradient descent in the space of probability density functions. Our main result shows that along the gradient flow induced by a distribution-dependent ordinary differential equation (ODE), the unknown data distribution emerges as the long-time limit of this flow of densities. That is, one can uncover the data distribution by simulating the distribution-dependent ODE. Intriguingly, we find that the simulation of the ODE is equivalent to the training of generative adversarial networks (GANs). The GAN framework, by definition a non-cooperative game between a generator and a discriminator, can therefore be viewed alternatively as a cooperative game between a navigator and a calibrator (in collaboration to simulate the ODE). At the theoretic level, this new perspective simplifies the analysis of GANs and gives new insight into their performance. To construct a solution to the distribution-dependent ODE, we first show that the associated nonlinear Fokker-Planck equation has a unique weak solution, using the Crandall-Liggett theorem for differential equations in Banach spaces. From this solution to the Fokker-Planck equation, we construct a unique solution to the ODE, relying on Trevisan's superposition principle. The convergence of the induced gradient flow to the data distribution is obtained by analyzing the Fokker-Planck equation.  ( 2 min )
    Geodesics, Non-linearities and the Archive of Novelty Search. (arXiv:2205.03162v1 [cs.LG])
    The Novelty Search (NS) algorithm was proposed more than a decade ago. However, the mechanisms behind its empirical success are still not well formalized/understood. This short note focuses on the effects of the archive on exploration. Experimental evidence from a few application domains suggests that archive-based NS performs in general better than when Novelty is solely computed with respect to the population. An argument that is often encountered in the literature is that the archive prevents exploration from backtracking or cycling, i.e. from revisiting previously encountered areas in the behavior space. We argue that this is not a complete or accurate explanation as backtracking - beside often being desirable - can actually be enabled by the archive. Through low-dimensional/analytical examples, we show that a key effect of the archive is that it counterbalances the exploration biases that result, among other factors, from the use of inadequate behavior metrics and the non-linearities of the behavior mapping. Our observations seem to hint that attributing a more active role to the archive in sampling can be beneficial.  ( 2 min )
    Real Time On Sensor Gait Phase Detection with 0.5KB Deep Learning Model. (arXiv:2205.03234v1 [eess.SP])
    Gait phase detection with convolution neural network provides accurate classification but demands high computational cost, which inhibits real time low power on-sensor processing. This paper presents a segmentation based gait phase detection with a width and depth downscaled U-Net like model that only needs 0.5KB model size and 67K operations per second with 95.9% accuracy to be easily fitted into resource limited on sensor microcontroller.  ( 2 min )
    Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation. (arXiv:2205.03043v1 [cs.SD])
    Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design. Each parameters configuration of a synthesizer produces a unique timbre and can be viewed as a unique instrument. The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem, i.e.: the synthesizer parameters estimation problem. We proposed a multi-modal deep-learning-based pipeline Sound2Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem. Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.  ( 2 min )
    New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography. (arXiv:2205.02900v1 [cs.LG])
    Undiagnosed diabetes is present in 21.4% of adults with diabetes. Diabetes can remain asymptomatic and undetected due to limitations in screening rates. To address this issue, questionnaires, such as the American Diabetes Association (ADA) Risk test, have been recommended for use by physicians and the public. Based on evidence that blood glucose concentration can affect cardiac electrophysiology, we hypothesized that an artificial intelligence (AI)-enhanced electrocardiogram (ECG) could identify adults with new-onset diabetes. We trained a neural network to estimate HbA1c using a 12-lead ECG and readily available demographics. We retrospectively assembled a dataset comprised of patients with paired ECG and HbA1c data. The population of patients who receive both an ECG and HbA1c may a biased sample of the complete outpatient population, so we adjusted the importance placed on each patient to generate a more representative pseudo-population. We found ECG-based assessment outperforms the ADA Risk test, achieving a higher area under the curve (0.80 vs. 0.68) and positive predictive value (14% vs. 9%) -- 2.6 times the prevalence of diabetes in the cohort. The AI-enhanced ECG significantly outperforms electrophysiologist interpretation of the ECG, suggesting that the task is beyond current clinical capabilities. Given the prevalence of ECGs in clinics and via wearable devices, such a tool would make precise, automated diabetes assessment widely accessible.  ( 2 min )
    The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations. (arXiv:2205.03295v1 [cs.LG])
    Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable model imitates the behavior of these blackbox models are often proposed to help users trust model predictions. In this work, we audit the quality of such explanations for different protected subgroups using real data from four settings in finance, healthcare, college admissions, and the US justice system. Across two different blackbox model architectures and four popular explainability methods, we find that the approximation quality of explanation models, also known as the fidelity, differs significantly between subgroups. We also demonstrate that pairing explainability methods with recent advances in robust machine learning can improve explanation fairness in some settings. However, we highlight the importance of communicating details of non-zero fidelity gaps to users, since a single solution might not exist across all settings. Finally, we discuss the implications of unfair explanation models as a challenging and understudied problem facing the machine learning community.  ( 2 min )
    Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations. (arXiv:2205.02959v1 [cs.AI])
    We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.  ( 2 min )
    Explainable multi-class anomaly detection on functional data. (arXiv:2205.02935v1 [stat.ML])
    In this paper we describe an approach for anomaly detection and its explainability in multivariate functional data. The anomaly detection procedure consists of transforming the series into a vector of features and using an Isolation forest algorithm. The explainable procedure is based on the computation of the SHAP coefficients and on the use of a supervised decision tree. We apply it on simulated data to measure the performance of our method and on real data coming from industry.  ( 2 min )
    Understanding Urban Water Consumption using Remotely Sensed Data. (arXiv:2205.02932v1 [cs.CV])
    Urban metabolism is an active field of research that deals with the estimation of emissions and resource consumption from urban regions. The analysis could be carried out through a manual surveyor by the implementation of elegant machine learning algorithms. In this exploratory work, we estimate the water consumption by the buildings in the region captured by satellite imagery. To this end, we break our analysis into three parts: i) Identification of building pixels, given a satellite image, followed by ii) identification of the building type (residential/non-residential) from the building pixels, and finally iii) using the building pixels along with their type to estimate the water consumption using the average per unit area consumption for different building types as obtained from municipal surveys.  ( 2 min )
    Multi-confound regression adversarial network for deep learning-based diagnosis on highly heterogenous clinical data. (arXiv:2205.02885v1 [cs.LG])
    Automated disease detection in medical images using deep learning holds promise to improve the diagnostic ability of radiologists, but routinely collected clinical data frequently contains technical and demographic confounding factors that differ between hospitals, negatively affecting the robustness of diagnostic deep learning models. Thus, there is a critical need for deep learning models that can train on imbalanced datasets without overfitting to site-specific confounding factors. In this work, we developed a novel deep learning architecture, MUCRAN (Multi-Confound Regression Adversarial Network), to train a deep learning model on highly heterogeneous clinical data while regressing demographic and technical confounding factors. We trained MUCRAN using 16,821 clinical T1 Axial brain MRIs collected from Massachusetts General Hospital before 2019 and tested it using post-2019 data to distinguish Alzheimer's disease (AD) patients, identified using both prescriptions of AD drugs and ICD codes, from a non-medicated control group. In external validation tests using MRI data from other hospitals, the model showed a robust performance of over 90% accuracy on newly collected data. This work shows the feasibility of deep learning-based diagnosis in real-world clinical data.  ( 2 min )
    Over-The-Air Federated Learning under Byzantine Attacks. (arXiv:2205.02949v1 [cs.LG])
    Federated learning (FL) is a promising solution to enable many AI applications, where sensitive datasets from distributed clients are needed for collaboratively training a global model. FL allows the clients to participate in the training phase, governed by a central server, without sharing their local data. One of the main challenges of FL is the communication overhead, where the model updates of the participating clients are sent to the central server at each global training round. Over-the-air computation (AirComp) has been recently proposed to alleviate the communication bottleneck where the model updates are sent simultaneously over the multiple-access channel. However, simple averaging of the model updates via AirComp makes the learning process vulnerable to random or intended modifications of the local model updates of some Byzantine clients. In this paper, we propose a transmission and aggregation framework to reduce the effect of such attacks while preserving the benefits of AirComp for FL. For the proposed robust approach, the central server divides the participating clients randomly into groups and allocates a transmission time slot for each group. The updates of the different groups are then aggregated using a robust aggregation technique. We extend our approach to handle the case of non-i.i.d. local data, where a resampling step is added before robust aggregation. We analyze the convergence of the proposed approach for both cases of i.i.d. and non-i.i.d. data and demonstrate that the proposed algorithm converges at a linear rate to a neighborhood of the optimal solution. Experiments on real datasets are provided to confirm the robustness of the proposed approach.  ( 2 min )
    Quantification of Robotic Surgeries with Vision-Based Deep Learning. (arXiv:2205.03028v1 [cs.RO])
    Surgery is a high-stakes domain where surgeons must navigate critical anatomical structures and actively avoid potential complications while achieving the main task at hand. Such surgical activity has been shown to affect long-term patient outcomes. To better understand this relationship, whose mechanics remain unknown for the majority of surgical procedures, we hypothesize that the core elements of surgery must first be quantified in a reliable, objective, and scalable manner. We believe this is a prerequisite for the provision of surgical feedback and modulation of surgeon performance in pursuit of improved patient outcomes. To holistically quantify surgeries, we propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery to independently achieve multiple tasks: surgical phase recognition (the what of surgery), gesture classification and skills assessment (the how of surgery). We validated our framework on four video-based datasets of two commonly-encountered types of steps (dissection and suturing) within minimally-invasive robotic surgeries. We demonstrated that our framework can generalize well to unseen videos, surgeons, medical centres, and surgical procedures. We also found that our framework, which naturally lends itself to explainable findings, identified relevant information when achieving a particular task. These findings are likely to instill surgeons with more confidence in our framework's behaviour, increasing the likelihood of clinical adoption, and thus paving the way for more targeted surgical feedback.  ( 2 min )
    Low Dimensional Invariant Embeddings for Universal Geometric Learning. (arXiv:2205.02956v1 [cs.LG])
    This paper studies separating invariants: mappings on $d$-dimensional semi-algebraic subsets of $D$ dimensional Euclidean domains which are invariant to semi-algebraic group actions and separate orbits. The motivation for this study comes from the usefulness of separating invariants in proving universality of equivariant neural network architectures. We observe that in several cases the cardinality of separating invariants proposed in the machine learning literature is much larger than the ambient dimension $D$. As a result, the theoretical universal constructions based on these separating invariants is unrealistically large. Our goal in this paper is to resolve this issue. We show that when a continuous family of semi-algebraic separating invariants is available, separation can be obtained by randomly selecting $2d+1 $ of these invariants. We apply this methodology to obtain an efficient scheme for computing separating invariants for several classical group actions which have been studied in the invariant learning literature. Examples include matrix multiplication actions on point clouds by permutations, rotations, and various other linear groups.  ( 2 min )
    Large Scale Transfer Learning for Differentially Private Image Classification. (arXiv:2205.02973v1 [cs.LG])
    Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Training models with DP protects the model against leakage of sensitive data in a potentially adversarial setting. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients, such that the trained model weights become nearly independent of the use any particular training example. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training. This is further exacerbated by the fact that increasing the number of parameters leads to larger degradation in utility with DP. In this work, we zoom in on the ImageNet dataset and demonstrate that similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the model is finetuned privately. Moreover, by systematically comparing private and non-private models across a range of huge batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP. By switching from DP-SGD to DP-LAMB we saw improvement of up to 20$\%$ points (absolute). Finally, we show that finetuning just the last layer for a \emph{single step} in the full batch setting leads to both SOTA results of 81.7 $\%$ under a wide privacy budget range of $\epsilon \in [4, 10]$ and $\delta$ = $10^{-6}$ while minimizing the computational overhead substantially.  ( 2 min )
    Evaluating Context for Deep Object Detectors. (arXiv:2205.02887v1 [cs.CV])
    Which object detector is suitable for your context sensitive task? Deep object detectors exploit scene context for recognition differently. In this paper, we group object detectors into 3 categories in terms of context use: no context by cropping the input (RCNN), partial context by cropping the featuremap (two-stage methods) and full context without any cropping (single-stage methods). We systematically evaluate the effect of context for each deep detector category. We create a fully controlled dataset for varying context and investigate the context for deep detectors. We also evaluate gradually removing the background context and the foreground object on MS COCO. We demonstrate that single-stage and two-stage object detectors can and will use the context by virtue of their large receptive field. Thus, choosing the best object detector may depend on the application context.  ( 2 min )
    Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing. (arXiv:2205.02953v1 [cs.RO])
    We present the results of our autonomous racing virtual challenge, based on the newly-released Learn-to-Race (L2R) simulation framework, which seeks to encourage interdisciplinary research in autonomous driving and to help advance the state of the art on a realistic benchmark. Analogous to racing being used to test cutting-edge vehicles, we envision autonomous racing to serve as a particularly challenging proving ground for autonomous agents as: (i) they need to make sub-second, safety-critical decisions in a complex, fast-changing environment; and (ii) both perception and control must be robust to distribution shifts, novel road features, and unseen obstacles. Thus, the main goal of the challenge is to evaluate the joint safety, performance, and generalisation capabilities of reinforcement learning agents on multi-modal perception, through a two-stage process. In the first stage of the challenge, we evaluate an autonomous agent's ability to drive as fast as possible, while adhering to safety constraints. In the second stage, we additionally require the agent to adapt to an unseen racetrack through safe exploration. In this paper, we describe the new L2R Task 2.0 benchmark, with refined metrics and baseline approaches. We also provide an overview of deployment, evaluation, and rankings for the inaugural instance of the L2R Autonomous Racing Virtual Challenge (supported by Carnegie Mellon University, Arrival Ltd., AICrowd, Amazon Web Services, and Honda Research), which officially used the new L2R Task 2.0 benchmark and received over 20,100 views, 437 active participants, 46 teams, and 733 model submissions -- from 88 unique institutions, in 28 different countries. Finally, we release leaderboard results from the challenge and provide description of the two top-ranking approaches in cross-domain model transfer, across multiple sensor configurations and simulated races.  ( 3 min )
    Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes. (arXiv:2205.02904v1 [cs.GR])
    This paper introduces a framework designed to accurately predict piecewise linear mappings of arbitrary meshes via a neural network, enabling training and evaluating over heterogeneous collections of meshes that do not share a triangulation, as well as producing highly detail-preserving maps whose accuracy exceeds current state of the art. The framework is based on reducing the neural aspect to a prediction of a matrix for a single given point, conditioned on a global shape descriptor. The field of matrices is then projected onto the tangent bundle of the given mesh, and used as candidate jacobians for the predicted map. The map is computed by a standard Poisson solve, implemented as a differentiable layer with cached pre-factorization for efficient training. This construction is agnostic to the triangulation of the input, thereby enabling applications on datasets with varying triangulations. At the same time, by operating in the intrinsic gradient domain of each individual mesh, it allows the framework to predict highly-accurate mappings. We validate these properties by conducting experiments over a broad range of scenarios, from semantic ones such as morphing, registration, and deformation transfer, to optimization-based ones, such as emulating elastic deformations and contact correction, as well as being the first work, to our knowledge, to tackle the task of learning to compute UV parameterizations of arbitrary meshes. The results exhibit the high accuracy of the method as well as its versatility, as it is readily applied to the above scenarios without any changes to the framework.  ( 2 min )
    Putting Density Functional Theory to the Test in Machine-Learning-Accelerated Materials Discovery. (arXiv:2205.02967v1 [cond-mat.mtrl-sci])
    Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.  ( 2 min )
    GreenDB: Toward a Product-by-Product Sustainability Database. (arXiv:2205.02908v1 [cs.LG])
    The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Modern retail platforms rely heavily on Machine Learning (ML) for their search and recommender systems. Thus, ML can potentially support efforts towards more sustainable consumption patterns, for example, by accounting for sustainability aspects in product search or recommendations. However, leveraging ML potential for reaching sustainability goals requires data on sustainability. Unfortunately, no open and publicly available database integrates sustainability information on a product-by-product basis. In this work, we present the GreenDB, which fills this gap. Based on search logs of millions of users, we prioritize which products users care about most. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs to improve sustainability information available for search and recommendation experiences. We present our proof of concept implementation of a scraping system that creates the GreenDB dataset.  ( 2 min )
    Lagrangian PINNs: A causality-conforming solution to failure modes of physics-informed neural networks. (arXiv:2205.02902v1 [cs.LG])
    Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) persists even when the boundary conditions are strictly enforced, and (ii) is closely related to the Kolmogorov n-width associated with problems demonstrating transport, convection, traveling waves, or moving fronts. Given this realization, we describe the mechanism underlying the training schemes such as those used in eXtended PINNs (XPINN), curriculum regularization, and sequence-to-sequence learning. For an important category of PDEs, i.e., governed by non-linear convection-diffusion equation, we propose reformulating PINNs on a Lagrangian frame of reference, i.e., LPINNs, as a PDE-informed solution. A parallel architecture with two branches is proposed. One branch solves for the state variables on the characteristics, and the second branch solves for the low-dimensional characteristics curves. The proposed architecture conforms to the causality innate to the convection, and leverages the direction of travel of the information in the domain. Finally, we demonstrate that the loss landscapes of LPINNs are less sensitive to the so-called "complexity" of the problems, compared to those in the traditional PINNs in the Eulerian framework.  ( 2 min )
    Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures. (arXiv:2205.02841v1 [eess.IV])
    The image captioning task is increasingly prevalent in artificial intelligence applications for medicine. One important application is clinical report generation from chest radiographs. The clinical writing of unstructured reports is time consuming and error-prone. An automated system would improve standardization, error reduction, time consumption, and medical accessibility. In this paper we demonstrate the importance of domain specific pre-training and propose a modified transformer architecture for the medical image captioning task. To accomplish this, we train a series of modified transformers to generate clinical reports from chest radiograph image input. These modified transformers include: a meshed-memory augmented transformer architecture with visual extractor using ImageNet pre-trained weights, a meshed-memory augmented transformer architecture with visual extractor using CheXpert pre-trained weights, and a meshed-memory augmented transformer whose encoder is passed the concatenated embeddings using both ImageNet pre-trained weights and CheXpert pre-trained weights. We use BLEU(1-4), ROUGE-L, CIDEr, and the clinical CheXbert F1 scores to validate our models and demonstrate competitive scores with state of the art models. We provide evidence that ImageNet pre-training is ill-suited for the medical image captioning task, especially for less frequent conditions (eg: enlarged cardiomediastinum, lung lesion, pneumothorax). Furthermore, we demonstrate that the double feature model improves performance for specific medical conditions (edema, consolidation, pneumothorax, support devices) and overall CheXbert F1 score, and should be further developed in future work. Such a double feature model, including both ImageNet pre-training as well as domain specific pre-training, could be used in a wide range of image captioning models in medicine.  ( 2 min )
    Exploiting Ligand Additivity for Transferable Machine Learning of Multireference Character Across Known Transition Metal Complex Ligands. (arXiv:2205.02879v1 [cond-mat.mtrl-sci])
    Accurate virtual high-throughput screening (VHTS) of transition metal complexes (TMCs) remains challenging due to the possibility of high multi-reference (MR) character that complicates property evaluation. We compute MR diagnostics for over 5,000 ligands present in previously synthesized transition metal complexes in the Cambridge Structural Database (CSD). To accomplish this task, we introduce an iterative approach for consistent ligand charge assignment for ligands in the CSD. Across this set, we observe that MR character correlates linearly with the inverse value of the averaged bond order over all bonds in the molecule. We then demonstrate that ligand additivity of MR character holds in TMCs, which suggests that the TMC MR character can be inferred from the sum of the MR character of the ligands. Encouraged by this observation, we leverage ligand additivity and develop a ligand-derived machine learning representation to train neural networks to predict the MR character of TMCs from properties of the constituent ligands. This approach yields models with excellent performance and superior transferability to unseen ligand chemistry and compositions.  ( 2 min )
    Generative Adversarial Network Based Synthetic Learning and a Novel Domain Relevant Loss Term for Spine Radiographs. (arXiv:2205.02843v1 [eess.IV])
    Problem: There is a lack of big data for the training of deep learning models in medicine, characterized by the time cost of data collection and privacy concerns. Generative adversarial networks (GANs) offer both the potential to generate new data, as well as to use this newly generated data, without inclusion of patients' real data, for downstream applications. Approach: A series of GANs were trained and applied for a downstream computer vision spine radiograph abnormality classification task. Separate classifiers were trained with either access or no access to the original imaging. Trained GANs included a conditional StyleGAN2 with adaptive discriminator augmentation, a conditional StyleGAN2 with adaptive discriminator augmentation to generate spine radiographs conditional on lesion type, and using a novel clinical loss term for the generator a StyleGAN2 with adaptive discriminator augmentation conditional on abnormality (SpineGAN). Finally, a differential privacy imposed StyleGAN2 with adaptive discriminator augmentation conditional on abnormality was trained and an ablation study was performed on its differential privacy impositions. Key Results: We accomplish GAN generation of synthetic spine radiographs without meaningful input for the first time from a literature review. We further demonstrate the success of synthetic learning for the spine domain with a downstream clinical classification task (AUC of 0.830 using synthetic data compared to AUC of 0.886 using the real data). Importantly, the introduction of a new clinical loss term for the generator was found to increase generation recall as well as accelerate model training. Lastly, we demonstrate that, in a limited size medical dataset, differential privacy impositions severely impede GAN training, finding that this is specifically due to the requirement for gradient perturbation with noise.  ( 2 min )
    AdaTriplet: Adaptive Gradient Triplet Loss with Automatic Margin Learning for Forensic Medical Image Matching. (arXiv:2205.02849v1 [eess.IV])
    This paper tackles the challenge of forensic medical image matching (FMIM) using deep neural networks (DNNs). FMIM is a particular case of content-based image retrieval (CBIR). The main challenge in FMIM compared to the general case of CBIR, is that the subject to whom a query image belongs may be affected by aging and progressive degenerative disorders, making it difficult to match data on a subject level. CBIR with DNNs is generally solved by minimizing a ranking loss, such as Triplet loss (TL), computed on image representations extracted by a DNN from the original data. TL, in particular, operates on triplets: anchor, positive (similar to anchor) and negative (dissimilar to anchor). Although TL has been shown to perform well in many CBIR tasks, it still has limitations, which we identify and analyze in this work. In this paper, we introduce (i) the AdaTriplet loss -- an extension of TL whose gradients adapt to different difficulty levels of negative samples, and (ii) the AutoMargin method -- a technique to adjust hyperparameters of margin-based losses such as TL and our proposed loss dynamically. Our results are evaluated on two large-scale benchmarks for FMIM based on the Osteoarthritis Initiative and Chest X-ray-14 datasets. The codes allowing replication of this study have been made publicly available at \url{https://github.com/Oulu-IMEDS/AdaTriplet}.  ( 2 min )
  • Open

    On boundary conditions parametrized by analytic functions. (arXiv:2205.03185v1 [cs.LG])
    Computer algebra can answer various questions about partial differential equations using symbolic algorithms. However, the inclusion of data into equations is rare in computer algebra. Therefore, recently, computer algebra models have been combined with Gaussian processes, a regression model in machine learning, to describe the behavior of certain differential equations under data. While it was possible to describe polynomial boundary conditions in this context, we extend these models to analytic boundary conditions. Additionally, we describe the necessary algorithms for Gr\"obner and Janet bases of Weyl algebras with certain analytic coefficients. Using these algorithms, we provide examples of divergence-free flow in domains bounded by analytic functions and adapted to observations.  ( 2 min )
    Tensor Principal Component Analysis in High Dimensional CP Models. (arXiv:2108.04428v3 [stat.ML] UPDATED)
    The CP decomposition for high dimensional non-orthogonal spiked tensors is an important problem with broad applications across many disciplines. However, previous works with theoretical guarantee typically assume restrictive incoherence conditions on the basis vectors for the CP components. In this paper, we propose new computationally efficient composite PCA and concurrent orthogonalization algorithms for tensor CP decomposition with theoretical guarantees under mild incoherence conditions. The composite PCA applies the principal component or singular value decompositions twice, first to a matrix unfolding of the tensor data to obtain singular vectors and then to the matrix folding of the singular vectors obtained in the first step. It can be used as an initialization for any iterative optimization schemes for the tensor CP decomposition. The concurrent orthogonalization algorithm iteratively estimates the basis vector in each mode of the tensor by simultaneously applying projections to the orthogonal complements of the spaces generated by other CP components in other modes. It is designed to improve the alternating least squares estimator and other forms of the high order orthogonal iteration for tensors with low or moderately high CP ranks, and it is guaranteed to converge rapidly when the error of any given initial estimator is bounded by a small constant. Our theoretical investigation provides estimation accuracy and convergence rates for the two proposed algorithms. Both proposed algorithms are applicable to deterministic tensor, its noisy version, and the order-$2K$ covariance tensor of order-$K$ tensor data in a factor model with uncorrelated factors. Our implementations on synthetic data demonstrate significant practical superiority of our approach over existing methods.  ( 2 min )
    Gaussian Processes for Missing Value Imputation. (arXiv:2204.04648v2 [stat.ML] UPDATED)
    Missing values are common in many real-life datasets. However, most of the current machine learning methods can not handle missing values. This means that they should be imputed beforehand. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs can be used to compute a predictive distribution for missing data. Here, we present a hierarchical composition of sparse GPs that is used to predict missing values at each dimension using all the variables from the other dimensions. We call the approach missing GP (MGP). MGP can be trained simultaneously to impute all observed missing values. Specifically, it outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP in one private clinical data set and four UCI datasets with a different percentage of missing values. We compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. The results obtained show a significantly better performance of MGP.  ( 2 min )
    Bayesian Sample Size Prediction for Online Activity. (arXiv:2111.12157v2 [stat.ML] UPDATED)
    In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.  ( 2 min )
    DADApy: Distance-based Analysis of DAta-manifolds in Python. (arXiv:2205.03373v1 [cs.LG])
    DADApy is a python software package for analysing and characterising high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in toy cases and in a real-world application. The package is freely available under the open-source Apache 2.0 license and can be downloaded from the Github page https://github.com/sissa-data-science/DADApy.  ( 2 min )
    Designing Robust Biotechnological Processes Regarding Variabilities using Multi-Objective Optimization Applied to a Biopharmaceutical Seed Train Design. (arXiv:2205.03261v1 [cs.LG])
    Development and optimization of biopharmaceutical production processes with cell cultures is cost- and time-consuming and often performed rather empirically. Efficient optimization of multiple-objectives like process time, viable cell density, number of operating steps & cultivation scales, required medium, amount of product as well as product quality depicts a promising approach. This contribution presents a workflow which couples uncertainty-based upstream simulation and Bayes optimization using Gaussian processes. Its application is demonstrated in a simulation case study for a relevant industrial task in process development, the design of a robust cell culture expansion process (seed train), meaning that despite uncertainties and variabilities concerning cell growth, low variations of viable cell density during the seed train are obtained. Compared to a non-optimized reference seed train, the optimized process showed much lower deviation rates regarding viable cell densities (<~10% instead of 41.7%) using 5 or 4 shake flask scales and seed train duration could be reduced by 56 h from 576 h to 520 h. Overall, it is shown that applying Bayes optimization allows for optimization of a multi-objective optimization function with several optimizable input variables and under a considerable amount of constraints with a low computational effort. This approach provides the potential to be used in form of a decision tool, e.g. for the choice of an optimal and robust seed train design or for further optimization tasks within process development.  ( 2 min )
    Learning Optimal Conformal Classifiers. (arXiv:2110.09192v3 [cs.LG] UPDATED)
    Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in high-stake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's predictions, e.g., its probability estimates, to predict confidence sets containing the true class with a user-specified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper end-to-end. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on mini-batches during training. Compared to standard training, ConfTr reduces the average confidence set size (inefficiency) of state-of-the-art CP methods applied after training. Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.  ( 2 min )
    Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization. (arXiv:2205.03059v1 [cs.LG])
    Nonconvex regularization has been popularly used in low-rank matrix learning. However, extending it for low-rank tensor learning is still computationally expensive. To address this problem, we develop an efficient solver for use with a nonconvex extension of the overlapped nuclear norm regularizer. Based on the proximal average algorithm, the proposed algorithm can avoid expensive tensor folding/unfolding operations. A special "sparse plus low-rank" structure is maintained throughout the iterations, and allows fast computation of the individual proximal steps. Empirical convergence is further improved with the use of adaptive momentum. We provide convergence guarantees to critical points on smooth losses and also on objectives satisfying the Kurdyka-{\L}ojasiewicz condition. While the optimization problem is nonconvex and nonsmooth, we show that its critical points still have good statistical performance on the tensor completion problem. Experiments on various synthetic and real-world data sets show that the proposed algorithm is efficient in both time and space and more accurate than the existing state-of-the-art.  ( 2 min )
    Probabilistic learning constrained by realizations using a weak formulation of Fourier transform of probability measures. (arXiv:2205.03078v1 [stat.ML])
    This paper deals with the taking into account a given set of realizations as constraints in the Kullback-Leibler minimum principle, which is used as a probabilistic learning algorithm. This permits the effective integration of data into predictive models. We consider the probabilistic learning of a random vector that is made up of either a quantity of interest (unsupervised case) or the couple of the quantity of interest and a control parameter (supervised case). A training set of independent realizations of this random vector is assumed to be given and to be generated with a prior probability measure that is unknown. A target set of realizations of the QoI is available for the two considered cases. The framework is the one of non-Gaussian problems in high dimension. A functional approach is developed on the basis of a weak formulation of the Fourier transform of probability measures (characteristic functions). The construction makes it possible to take into account the target set of realizations of the QoI in the Kullback-Leibler minimum principle. The proposed approach allows for estimating the posterior probability measure of the QoI (unsupervised case) or of the posterior joint probability measure of the QoI with the control parameter (supervised case). The existence and the uniqueness of the posterior probability measure is analyzed for the two cases. The numerical aspects are detailed in order to facilitate the implementation of the proposed method. The presented application in high dimension demonstrates the efficiency and the robustness of the proposed algorithm.  ( 2 min )
    Benchmarking Econometric and Machine Learning Methodologies in Nowcasting. (arXiv:2205.03318v1 [stat.ML])
    Nowcasting can play a key role in giving policymakers timelier insight to data published with a significant time lag, such as final GDP figures. Currently, there are a plethora of methodologies and approaches for practitioners to choose from. However, there lacks a comprehensive comparison of these disparate approaches in terms of predictive performance and characteristics. This paper addresses that deficiency by examining the performance of 12 different methodologies in nowcasting US quarterly GDP growth, including all the methods most commonly employed in nowcasting, as well as some of the most popular traditional machine learning approaches. Performance was assessed on three different tumultuous periods in US economic history: the early 1980s recession, the 2008 financial crisis, and the COVID crisis. The two best performing methodologies in the analysis were long short-term memory artificial neural networks (LSTM) and Bayesian vector autoregression (BVAR). To facilitate further application and testing of each of the examined methodologies, an open-source repository containing boilerplate code that can be applied to different datasets is published alongside the paper, available at: github.com/dhopp1/nowcasting_benchmark.  ( 2 min )
    Scalable computation of prediction intervals for neural networks via matrix sketching. (arXiv:2205.03194v1 [stat.ML])
    Accounting for the uncertainty in the predictions of modern neural networks is a challenging and important task in many domains. Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure (e.g., Bayesian neural networks) or dramatically increase the computational cost of predictions such as approaches based on ensembling. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals. The method is based on the classical delta method in statistics but achieves computational efficiency by using matrix sketching to approximate the Jacobian matrix. The resulting algorithm is competitive with state-of-the-art approaches for constructing predictive intervals on various regression datasets from the UCI repository.  ( 2 min )
    Differentially Private Generalized Linear Models Revisited. (arXiv:2205.03014v1 [cs.LG])
    We study the problem of $(\epsilon,\delta)$-differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of $\tilde{O}\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^* \Vert^2}{(n\epsilon)^{2/3}},\frac{\sqrt{d}\Vert w^*\Vert^2}{n\epsilon}\right\}\right)$, where $n$ is the number of samples, $d$ is the dimension of the problem, and $w^*$ is the minimizer of the population risk. Apart from the dependence on $\Vert w^\ast\Vert$, our bound is essentially tight in all parameters. In particular, we show a lower bound of $\tilde{\Omega}\left(\frac{1}{\sqrt{n}} + {\min\left\{\frac{\Vert w^*\Vert^{4/3}}{(n\epsilon)^{2/3}}, \frac{\sqrt{d}\Vert w^*\Vert}{n\epsilon}\right\}}\right)$. We also revisit the previously studied case of Lipschitz losses [SSTT20]. For this case, we close the gap in the existing work and show that the optimal rate is (up to log factors) $\Theta\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^*\Vert}{\sqrt{n\epsilon}},\frac{\sqrt{\text{rank}}\Vert w^*\Vert}{n\epsilon}\right\}\right)$, where $\text{rank}$ is the rank of the design matrix. This improves over existing work in the high privacy regime. Finally, our algorithms involve a private model selection approach that we develop to enable attaining the stated rates without a-priori knowledge of $\Vert w^*\Vert$.  ( 2 min )
    Active Offline Policy Selection. (arXiv:2106.10251v4 [cs.LG] UPDATED)
    This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation. Yet, large amounts of online interactions are often not possible in practice. To overcome this problem, we introduce active offline policy selection - a novel sequential decision approach that combines logged data with online interaction to identify the best policy. We use OPE estimates to warm start the online evaluation. Then, in order to utilize the limited environment interactions wisely we decide which policy to evaluate next based on a Bayesian optimization method with a kernel that represents policy similarity. We use multiple benchmarks, including real-world robotics, with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.  ( 2 min )
    Solar: $L_0$ solution path averaging for fast and accurate variable selection in high-dimensional data. (arXiv:2007.15707v3 [stat.ML] UPDATED)
    We propose a new variable selection algorithm, subsample-ordered least-angle regression (solar), and its coordinate descent generalization, solar-cd. Solar re-constructs lasso paths using the $L_0$ norm and averages the resulting solution paths across subsamples. Path averaging retains the ranking information of the informative variables while averaging out sensitivity to high dimensionality, improving variable selection stability, efficiency, and accuracy. We prove that: (i) with a high probability, path averaging perfectly separates informative variables from redundant variables on the average $L_0$ path; (ii) solar variable selection is consistent and accurate; and (iii) the probability that solar omits weak signals is controllable for finite sample size. We also demonstrate that: (i) solar yields, with less than $1/3$ of the lasso computation load, substantial improvements over lasso in terms of the sparsity (64-84\% reduction in redundant variable selection) and accuracy of variable selection; (ii) compared with the lasso safe/strong rule and variable screening, solar largely avoids selection of redundant variables and rejection of informative variables in the presence of complicated dependence structures; (iii) the sparsity and stability of solar conserves residual degrees of freedom for data-splitting hypothesis testing, improving the accuracy of post-selection inference on weak signals with limited $n$; (iv) replacing lasso with solar in bootstrap selection (e.g., bolasso or stability selection) produces a multi-layer variable ranking scheme that improves selection sparsity and ranking accuracy with the computation load of only one lasso realization; and (v) given the computation resources, solar bootstrap selection is substantially faster (98\% lower computation time) than the theoretical maximum speedup for parallelized bootstrap lasso (confirmed by Amdahl's law).  ( 2 min )
    R-GCN: The R Could Stand for Random. (arXiv:2203.02424v2 [cs.LG] UPDATED)
    The inception of the Relational Graph Convolutional Network (R-GCN) marked a milestone in the Semantic Web domain as a widely cited method that generalises end-to-end hierarchical representation learning to Knowledge Graphs (KGs). R-GCNs generate representations for nodes of interest by repeatedly aggregating parameterised, relation-specific transformations of their neighbours. However, in this paper, we argue that the the R-GCN's main contribution lies in this "message passing" paradigm, rather than the learned weights. To this end, we introduce the "Random Relational Graph Convolutional Network" (RR-GCN), which leaves all parameters untrained and thus constructs node embeddings by aggregating randomly transformed random representations from neighbours, i.e., with no learned parameters. We empirically show that RR-GCNs can compete with fully trained R-GCNs in both node classification and link prediction settings.  ( 2 min )
    Belief propagation for permutations, rankings, and partial orders. (arXiv:2110.00513v2 [cs.AI] UPDATED)
    Many datasets give partial information about an ordering or ranking by indicating which team won a game, which item a user prefers, or who infected whom. We define a continuous spin system whose Gibbs distribution is the posterior distribution on permutations, given a probabilistic model of these interactions. Using the cavity method we derive a belief propagation algorithm that computes the marginal distribution of each node's position. In addition, the Bethe free energy lets us approximate the number of linear extensions of a partial order and perform model selection between competing probabilistic models, such as the Bradley-Terry-Luce model of noisy comparisons and its cousins.  ( 2 min )
    What Makes A Good Fisherman? Linear Regression under Self-Selection Bias. (arXiv:2205.03246v1 [math.ST])
    In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selection criterion. For example, we might observe the highest output, the smallest output, or the median output of the $k$ models. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics and applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear. In the known-index case, we require poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate quite general selection criteria. In the more challenging unknown-index case, even the identifiability of the linear models (from infinitely many samples) was not known. We show three results in this case for the commonly studied $\max$ self-selection criterion: (1) we show that the linear models are indeed identifiable, (2) for general $k$ we provide an algorithm with poly$(d) \exp(\text{poly}(k))$ sample and time complexity to estimate the regression parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and time complexity.  ( 2 min )
    Optimally tackling covariate shift in RKHS-based nonparametric regression. (arXiv:2205.02986v1 [math.ST])
    We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of the likelihood ratio apart from an upper bound on it. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a na\"\i ve estimator, which minimizes the empirical risk over the function class, is strictly suboptimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where likelihood ratio is possibly unbounded yet has a finite second moment. Here, we show via careful simulations that KRR fails to attain the optimal rate. Instead, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax optimal, up to logarithmic factors.  ( 2 min )
    Lagrangian PINNs: A causality-conforming solution to failure modes of physics-informed neural networks. (arXiv:2205.02902v1 [cs.LG])
    Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) persists even when the boundary conditions are strictly enforced, and (ii) is closely related to the Kolmogorov n-width associated with problems demonstrating transport, convection, traveling waves, or moving fronts. Given this realization, we describe the mechanism underlying the training schemes such as those used in eXtended PINNs (XPINN), curriculum regularization, and sequence-to-sequence learning. For an important category of PDEs, i.e., governed by non-linear convection-diffusion equation, we propose reformulating PINNs on a Lagrangian frame of reference, i.e., LPINNs, as a PDE-informed solution. A parallel architecture with two branches is proposed. One branch solves for the state variables on the characteristics, and the second branch solves for the low-dimensional characteristics curves. The proposed architecture conforms to the causality innate to the convection, and leverages the direction of travel of the information in the domain. Finally, we demonstrate that the loss landscapes of LPINNs are less sensitive to the so-called "complexity" of the problems, compared to those in the traditional PINNs in the Eulerian framework.  ( 2 min )
    Explainable multi-class anomaly detection on functional data. (arXiv:2205.02935v1 [stat.ML])
    In this paper we describe an approach for anomaly detection and its explainability in multivariate functional data. The anomaly detection procedure consists of transforming the series into a vector of features and using an Isolation forest algorithm. The explainable procedure is based on the computation of the SHAP coefficients and on the use of a supervised decision tree. We apply it on simulated data to measure the performance of our method and on real data coming from industry.  ( 2 min )

  • Open

    Hello, is there any suggestions of scalable multiagent RL environements ( I want to be able to change the number of agents in the env). Thank you
    submitted by /u/Ok_Lab_2750 [link] [comments]  ( 1 min )
    "BARL: An Experimental Design Perspective on Model-Based Reinforcement Learning" (on Mehta et al 2021)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    dialogue history in dialogue generation
    what is dialogue history in dialogue generation in the paper titled "deep Reinforcement learning for Dialogue generation" in section 4.3 https://arxiv.org/abs/1606.01541 submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    I’ve published a new repo with A collection of Deep RL algorithms implemented with PyTorch
    submitted by /u/Top_Serve_2348 [link] [comments]
    Will training in multi agent reinforcement learning converge? Assume there are two agents, "A get stronger, B learn from errors, B get stronger, A learn from errors so on .....", will this happen?
    submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    reinforcement learning in finance and trading
    how is it different than time series analysis or since it has enormous amount of data to analyze, is it worth using RL in this field... if so plz link the github repository containing RL papers on finance and trading submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    Question about Vectorized Environments and GPU/Cuda training
    I have a bunch of questions that I can't seem to find any good answers to, so I might as well ask them here. I'm learning stablebaselines3 by creating a snake game. Pretty simple. I've enabled cuda training, so that I can use my 1080Ti. When I vectorize my custom environment, I do see some efficiency gains in training, but not many (using PPO model): 1,000,000 timestep training: n_envs = 4 --> 20 minutes n_envs = 12 --> 16 minutes n_envs = 100 --> 13 minutes ​ The time efficiency does not scale linearly with the amount of environments running concurrently. So I'm guessing that either I'm doing something wrong or perhaps the efficiency gains aren't that impressive in general. It seems that the 1080 Ti has somewhere around 3000 CUDA cores. Does that mean I can run 3000 concurrent environments? Also... after 1 million timesteps the snake AI is still pretty dumb. Is this normal, or should I see relatively good results after 1 million timesteps, and if I don't that's an indication that my reward system or something else isn't good? Thanks! submitted by /u/HaikuHaiku [link] [comments]  ( 1 min )
  • Open

    Top 10 Leading Universities in AI Research
    Deep learning, natural language processing, data analytics, and big-data mining are fields of Artificial Intelligence (AI), and many companies are looking for professionals in these fields. A professional degree in AI from a reputed university will help you get started in this industry. Read more submitted by /u/ridamughal110 [link] [comments]
    Is AI cost-effective?
    Artificial Intelligence (AI) has the potential to disrupt the global economy by bringing significant changes to the industry. Corporations are recognizing AI's competitive edge, and as a result, a growing number of companies are implementing AI technology and reaping significant benefits. Read more submitted by /u/ridamughal110 [link] [comments]
    Overview of over 160 Biases (Belief, decision-making & behavioral, Social, Memory)
    Cognitive biases are an essential and serious issue. Especially for people who deal with data (algorithms). I've compiled a list (pdf/EPUB) of over 160 biases (mainly from Wikipedia). Maybe this is useful for some. These biases affect belief formation, reasoning processes, business & economic decisions, and human behavior in general. Let's learn more about our human biases to make less biased conclusions in the future. The PDF/EPUB can be downloaded for free on leanpub: Cognitive Biases: A Brief Overview of Over 160 Cognitive Biases submitted by /u/Philo167 [link] [comments]  ( 1 min )
    Poker AI Plays Itself, Match 3
    submitted by /u/bluboxsw [link] [comments]
    Alibaba Introduces ‘FederatedScope’: An Easy-To-Use Federated Learning Platform Providing Comprehensive Functionalities
    Federated learning is a machine learning technique that trains a model over several dispersed nodes or hosts, as the name suggests. Each node utilizes its own training data. If the model parameters are shared between nodes rather than the raw data, the data can be kept private. Due to privacy concerns, obtaining training data to design and evolve machine learning models is increasingly being questioned, and federated learning can help alleviate some of these issues. The Chinese e-commerce behemoth, Alibaba, has created a federated learning platform that allows machine learning algorithms to be constructed without sharing training data. Continue Reading Github: https://github.com/alibaba/FederatedScope submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Uniting human brains and computers: A new type of AI
    submitted by /u/bendee983 [link] [comments]
    The ML pipeline & Types of roles in ML
    submitted by /u/mr-minion [link] [comments]
    Presentation of Project: The Ark Evolving
    I am a programmer of intermediate knowledge and this is a project in development only by me, it is based on creating and executing an artificial intelligence to play the Magic Arena card game in an advanced way, the AI ​​will have supervised learning using the minimax algorithm. At the moment I have the concept, and that is why I am making this post to ask you for advice and recommendations that you can give me. submitted by /u/Merelex_Buk-12 [link] [comments]  ( 1 min )
    Iterative to Launch Open Source Tool, First to Train Machine Learning Models on Any Cloud Using HashiCorp’s Terraform Solution
    submitted by /u/thumbsdrivesmecrazy [link] [comments]
    Apple loses Ian Goodfellow, director of machine learning, inventor of GANs, over return-to-office policy
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 1 min )
    Watch this model explain, debug, document + generate test cases and answer questions about your code!
    submitted by /u/landongarrison [link] [comments]  ( 1 min )
  • Open

    [D] Inequality of the paper [On the uncertainty principle of neural networks]
    I'm reading the paper [On the uncertainty principle of neural networks](https://arxiv.org/abs/2205.01493), and i'm doubting one inequality. In the middle of the equation (7) and (8), there's a part that states If we use the property ~~~ ​ https://preview.redd.it/tz6k5htt0cy81.png?width=774&format=png&auto=webp&s=99fc8b5701e00fdf253e2f52eb2fde240772fe5e Is the yellow inequality true? I think the inequality should be opposite, due to arithmetic–geometric mean inequality. Thks. submitted by /u/jryoungw2035 [link] [comments]  ( 1 min )
    [News] This “amateur” programmer once used 50 sheets of 1080Ti to fight cancer
    submitted by /u/coolwulf [link] [comments]  ( 1 min )
    [P] [python] Trading Bot! Our team is looking for 2 new team members! We already done a lot! Info in text :)
    Hello, Our team has four team members based in Europe, 27-39 yrs old. We are doing the trading bot from the scratch based on strategy related to momentum and fractal geometry. I was manually trading that strategy for 3 or more years and made it unique. Live trader (the bot) is already built, but it requires more development. Backtrader is 90% completed. Results are promising and that is what keep us pushing the project. All team members share the source code as our reward. We decided to acquire 1 or 2 team members. Please feel free to approach me, Thanks! submitted by /u/Prior_Regret_4138 [link] [comments]  ( 1 min )
    [D] Machine Learning - WAYR (What Are You Reading) - Week 137
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 12 Week 22 Week 32 Week 42 Week 52 Week 62 Week 72 Week 82 Week 92 Week 102 Week 112 Week 122 Week 132 Week 3 Week 13 Week 23 Week 33 Week 43 Week 53 Week 63 Week 73 Week 83 Week 93 Week 103 Week 113 Week 123 Week 133 Week 4 Week 14 Week 24 Week 34 Week 44 Week 54 Week 64 Week 74 Week 84 Week 94 Week 104 Week 114 Week 124 Week 134 Week 5 Week 15 Week 25 Week 35 Week 45 Week 55 Week 65 Week 75 Week 85 Week 95 Week 105 Week 115 Week 125 Week 135 Week 6 Week 16 Week 26 Week 36 Week 46 Week 56 Week 66 Week 76 Week 86 Week 96 Week 106 Week 116 Week 126 Week 136 Week 7 Week 17 Week 27 Week 37 Week 47 Week 57 Week 67 Week 77 Week 87 Week 97 Week 107 Week 117 Week 127 Week 8 Week 18 Week 28 Week 38 Week 48 Week 58 Week 68 Week 78 Week 88 Week 98 Week 108 Week 118 Week 128 Week 9 Week 19 Week 29 Week 39 Week 49 Week 59 Week 69 Week 79 Week 89 Week 99 Week 109 Week 119 Week 129 Week 10 Week 20 Week 30 Week 40 Week 50 Week 60 Week 70 Week 80 Week 90 Week 100 Week 110 Week 120 Week 130 Most upvoted papers two weeks ago: /u/CatalyzeX_code_bot: Paper link Besides that, there are no rules, have fun. submitted by /u/ML_WAYR_bot [link] [comments]  ( 1 min )
    [D] Is WGAN-GP gradient penalty applicable to the generator?
    I am currently studying a set of papers on GAN-driven Video-to-Speech synthesis, and one of those in particular has got me scratching my head in confusion. The study describes an architecture based on WGAN-GP, a modification of WGAN which aims to enforce 1-Lipschitz constraint by means of a gradient penalty. Specifically, the original paper presents an algorithm in which said penalty is applied to the critic, keeping the generator's loss function relatively simple. In this paper, however, the loss functions specified show the penalty applied to the generator instead, with the critic's loss function being identical to that of a standard WGAN. They do, however, state right below that " The coefficient for the gradient penalty λ is kept at the value of 10 for both critics..." https://preview.redd.it/gdp35hihsay81.png?width=793&format=png&auto=webp&s=7eedf0103ee3b33afe991148ac5a7552d262f35f So the question is: which is it? Should I consider this to be a typo of some sort, or am I not understanding something in this paper? I can understand how keeping the function's gradient norm below 1 may help when applied to the critic, but what benefit could it have for the generator? Are the other loss functions used in the paper somehow relevant perhaps? submitted by /u/ShujiMikami [link] [comments]  ( 1 min )
    [D] Looking for a python library that implements decision tree regressors handling categorical features
    Hello community, I am a bit baffled scikit-learn does not support this. I am looking for a good python library that enables fitting a decision tree regressor on both numerical and categorical features (non-binary tree). Could you point me to one if you know any, please? Thanks! (PS: This is for visualization and interpretability, so things like catboost won't do) EDIT: I think I may have found what I was looking for: chefboost (PS: unfortunately it is a bit simplistic and does not support pruning atm) Also xMattC3 pointed this ongoing PR for scikit learn: NOCATS submitted by /u/yannbouteiller [link] [comments]  ( 2 min )
    [P] I’ve been trying to understand the limits of some of the available machine learning models out there. Built an app that lets you try a mix of CLIP from Open AI + Apple’s version of MobileNet, and more directly on your phone's camera roll.
    submitted by /u/Playgroundai [link] [comments]  ( 1 min )
    [N] We're sharing details on new methods to more efficiently train Vision Transformers, a model architecture that can achieve state-of-the-art results on a wide range of computer vision tasks.
    submitted by /u/AbjectDrink3276 [link] [comments]  ( 1 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 1 min )
    [P] Comparison of NLP Sentiment Analysis ML models - VADER vs roBERTa
    submitted by /u/robikscuber [link] [comments]
    [N] Deepmind's Flamingo combines speech and vision
    submitted by /u/much_successes [link] [comments]
    [D] Dynamic Time Warping to align sequence to part of a longer sequence
    I have a dataset of audio files, and I wish to create a search functionality in the audio space: given a short fragment containing only one spoken word, I want to find the exact location of that word in a longer video containing multiple words. Let's say I want to avoid audio transcription for some reason. So I want to find the exact location within the audio without knowing which word is used, but just purely based on the waveform. I could try doing this with a sliding window, but I thought Dynamic Time Warping (DTW) would be an interesting alternative. However, DTW requires that the start and end point of both sequences are the same. This isn't the case here, as the query sequence is a subset of the longer sequence (or may even not be present in the longer sequence). Also, finding the start and end points in the longer sequence is kind of the goal, so it's not useful as a prior. Are there any algorithms similar to DTW or any approaches I should know about? submitted by /u/RaptorDotCpp [link] [comments]  ( 3 min )
    [D] AWS sagemaker vs similar tool which can run on any cloud
    Hi all, I've been a user of AWS Sagemaker for quite some time. Now we got some other cloud credits. I want a multi-cloud solution that can help me run training jobs. I'm planning on this approach. Have some multi-cloud object storage. Save all my data here. Will MiniO help? Use Terraform to create configuration files to create nodes on any cloud. How difficult will this be? Write a simple configuration file that can take cloud provider, instance types, training docker, and storage path and run a training job Is this implemented already? I already am exploring some serverless trainings. Any thoughts on this? Any one with the same problem who would like to contribute? submitted by /u/scb_11 [link] [comments]  ( 1 min )
    [P] Optimizing ML Workloads with TPI (Terraform Provider Iterative) - One Provider to Rule AWS, Azure, GCP, K8s
    The guide introduces Terraform Provider Iterative (TPI) - an open-source tool extending the functionality of Terraform. The tool enables full lifecycle management of computing resources and is designed specifically for ML pipelines: Machine Learning Workloads with Terraform Provider Iterative It was designed for machine learning (ML/AI) teams and optimizes CPU/GPU expenses. TPI unifies auto-scaling groups for all the major cloud providers: AWS, Azure, GCP and Kubernetes. Spot instances auto-recovery (if an instance was evicted/terminated) with data and checkpoint synchronization Auto-terminate instances when ML training is finished - you won't forget to terminate your expensive GPU instance for a week :) Using Terraform commands and config (HCL) submitted by /u/cmstrump [link] [comments]  ( 1 min )
    [D] Tips for reviewing at top A* conferences
    I'll be reviewing the paper at a top conference for first time this year so I was wondering if you guys can share some tips on how to be a good reviewer and how you guys typically review a submission at any of the top A* conferences. submitted by /u/furious_madman_123 [link] [comments]  ( 1 min )
    [N] Ian Goodfellow, Apple’s director of machine learning, is leaving the company due to its return to work policy. In a note to staff, he said “I believe strongly that more flexibility would have been the best policy for my team.” He was likely the company’s most cited ML expert.
    submitted by /u/hardmaru [link] [comments]  ( 4 min )
    [D] How much time do you spend transforming data before training a model?
    I am looking for straightforward ways, if they exist, to check for data transformation opportunities for different datasets. submitted by /u/sphinx00777 [link] [comments]
  • Open

    Need help in understanding role of activation functions.
    This question is being asked here after i had a discord with a professor of mine. As far as i understood it, activation functions are used to make neural networks non linear but he believes activates and de activates individual nodes of a NN. I cant get my head around how it activates a neuron. Without using a threshold function. Taking the case of a neuron with a sigmoid activation the output will never be 0, though miniscule in case when x is a large negetive number, but still not 0. I tried to find the answer online but found sites having different opinions. submitted by /u/Pritesh190801 [link] [comments]  ( 2 min )
    The ML pipeline & Types of roles in ML
    submitted by /u/mr-minion [link] [comments]
    Defeating Amazon Rekognition's face detection
    Amazon's IT department, also known as Amazon Web Services, or AWS, developed a computer vision platform called Rekognition. https://aws.amazon.com/rekognition/ Rekognition is a cloud-base software as a service that does image and video analysis. This system does a lot, however my focus is on face recognition of people (not only users of the system, but anyone). AWS provides cloud services, which means they have infinite amount of space on their servers. There are claims that I couldn't confirm (so far) on Amazon getting data of people from not only social media sources, but also from sources that are officially not public. If the data is saved on AWS cloud services, Amazon practically has this data. Amazon takes pride in a long list of customers https://aws.amazon.com/rekognition/cu…  ( 2 min )
  • Open

    Network security guidelines for 2022
    Cyber security has become the buzzword in the tech-savvy world where the entire world hinges on digital platforms. It has become very critical for individuals to become cyber aware. People are thereby developing security tips to help individuals manage their digital presence. Whether for personal use or professional undertaking, cyber security has become a significant… Read More »Network security guidelines for 2022 The post Network security guidelines for 2022 appeared first on Data Science Central.  ( 3 min )
    Understanding EXIF Data and How to View It on Android Phones
    Mοst mοdеrn-day smartphοnеs with grеat camеras makе it rеally еasy fοr anyοnе and еvеryοnе tο click gοοd phοtοs. But what mοst pеοplе dοn’t rеalizе is that with еach phοtοgraph, thеy’rе alsο capturing an lοt οf pеrsοnal infοrmatiοn which, whеn thеy sharе that phοtο οn sοcialmеdia, bеcοmеs availablе tο a much widеrsеt οf pеοplе οn thе… Read More »Understanding EXIF Data and How to View It on Android Phones The post Understanding EXIF Data and How to View It on Android Phones appeared first on Data Science Central.  ( 3 min )
    AI In Manufacturing: Know How Latest Intelligence Reshaping the Industries with Speed and Accuracy
    Last few years ago, the industrial revolution is the most popular evolution ever faced by the industrial sector. It encompasses all the latest technology trends which affecting industries over the world. Autonomous cars, smart connected devices, sensors, computer chips, and many other technologies represented this transformation. This happens due to the manufacturing industry has been… Read More »AI In Manufacturing: Know How Latest Intelligence Reshaping the Industries with Speed and Accuracy The post AI In Manufacturing: Know How Latest Intelligence Reshaping the Industries with Speed and Accuracy appeared first on Data Science Central.  ( 3 min )
    Social Media, Cyber Bullying, and Need For Content Moderation
    A rise in netizens and social media platforms, coupled with the growth of the mobile internet, has resulted in a jump in the creation and consumption of User Generated Content (UGC). Social media platforms, in all honesty, have become a major channel for disseminating, circulating, and exchanging information to billions of people on the internet… Read More »Social Media, Cyber Bullying, and Need For Content Moderation The post Social Media, Cyber Bullying, and Need For Content Moderation appeared first on Data Science Central.  ( 6 min )
    Comparing Cloud Telephony With On-Premise Call Centers: An Upgrade
    Contact center dynamics are changing as well as the customer engagement landscape. Due to the growing customer expectations and time constraints, cloud call center solutions are more important than ever before. With technology, businesses and agents can offer omnichannel customer service, and customers can handle most queries themselves. In order to achieve high omnichannel engagement,… Read More »Comparing Cloud Telephony With On-Premise Call Centers: An Upgrade The post Comparing Cloud Telephony With On-Premise Call Centers: An Upgrade appeared first on Data Science Central.  ( 4 min )
    The long game: Desiloed systems and feedback loops by design (I of II)
    Data management (DM) discussions can be frustrating because both those feeling the pain and the consultants who try to help them are–90+ percent of the time, it seems–still using the same old ways. Those ways only go so far, and won’t go any farther. That’s because those who reinforce the old ways assume that what… Read More »The long game: Desiloed systems and feedback loops by design (I of II) The post The long game: Desiloed systems and feedback loops by design (I of II) appeared first on Data Science Central.  ( 5 min )
    The Graph of Thrones: the Now Graph and Eternal Graph in RDF-Star Modeling
    Warning: This is going to get heavy into Turtle code, but I think there’s enough here for it to be worth reading if you are involved in knowledge graph work. I’ve been working with knowledge graphs a lot lately, and a conversation that I had with a few other ontologists has been resonating in my… Read More »The Graph of Thrones: the Now Graph and Eternal Graph in RDF-Star Modeling The post The Graph of Thrones: the Now Graph and Eternal Graph in RDF-Star Modeling appeared first on Data Science Central.  ( 9 min )
  • Open

    Driving Experimentation Forward through a Working Group (Experimentation Program Series: Guide 03)
    In my previous post, I defined an experimentation program (ExPr) as the mechanism by which a company uses randomized controlled experiments to generate positive business results. An ExPr is composed of the people, processes, and infrastructure for running experiments at… Read More The post Driving Experimentation Forward through a Working Group (Experimentation Program Series: Guide 03) appeared first on ML in Production.  ( 7 min )
  • Open

    Driving Experimentation Forward through a Working Group (Experimentation Program Series: Guide 03)
    In my previous post, I defined an experimentation program (ExPr) as the mechanism by which a company uses randomized controlled experiments to generate positive business results. An ExPr is composed of the people, processes, and infrastructure for running experiments at… Read More The post Driving Experimentation Forward through a Working Group (Experimentation Program Series: Guide 03) appeared first on ML in Production.  ( 7 min )

  • Open

    Any interesting “less obvious” real-world applications of RL
    Reinforcement learning has immediate applications in industrial robotics and other control oriented tasks. Are there any interesting real-world applications of RL that is less obvious than robotics or trading? I saw one application in the crypto space and am curious about the other different possible applications (can be in any sector) of RL submitted by /u/blitzkreig3 [link] [comments]  ( 1 min )
    No module named 'gym.envs.classic_control.rendering' . How do I fix it?
    I ran into it when running retro.examples.random_agent . submitted by /u/sirThomasmacAbyss [link] [comments]  ( 1 min )
    Reasonable training result, but how to improve further?
    Hi all, I have a 4 dof robot. I am trying to teach this specifical movement: "Whenever you move, dont move joint 1 (orange in the plot) at the same time with joint 2, 3, 4". The corresponding reward function is: reward= 1/( abs(torque_q1) * max(abs(torque_q2) , abs(torque_q3), abs(torque_q4) ) As the plot shows, the learned policy somehow reprocues the intended movement: first q1 movement and the other joints. But the part that I want to improve is around at t=13. There q1 gradually decreases and the other joints gradually start to move. Is there a way to improve this so that there is a complete stop of q1 movement and then the other joints start to move? ​ https://preview.redd.it/do8gaqyhm3y81.png?width=2892&format=png&auto=webp&s=2bbc13523e9043de7a6316f3edd2e2741e910cd0 submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
    Is it possible to train the Reinforcement Learning agent using a data set (in order to make use of human expertise in a game set for example) to reinforce the training (like a combination between reinforcement learning and supervised learning )?
    submitted by /u/Ok_Lab_2750 [link] [comments]  ( 2 min )
    text summarization using RL
    is it worth to work on text summarization even though big techs are already working on this problem and already achieved good results. Can someone suggest the topic which can be explored using techniques of RL, in the domain of NLP submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    Can we use a stochastic policy in DDPG framework?
    In DDPG we use a deterministic policy, i.e. s--->a, each state has a deterministic action, but can we use a stochastic policy, i.e. the actions are sampled from a distribution. I think this might encourage exploration and thus can improve the performance. What do you think? submitted by /u/Better-Ad8608 [link] [comments]  ( 2 min )
    application of Reinforcement learning in NLP
    someone please give abstract how RL is being used in NLP. looking for someone to provide content for reinforcement learning in NLP. including papers, and some other sources submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    Anyone has experience with Isaac Gym
    Hi all, did anyone try to use Isaac Gym for a custom robot/ algorithm? In example scripts, they use def pre_physics_step(self, actions): to call the actions for the robot that is a child class of BaseTask. Unfortunately, I can not modify how these actions are created as the script for BaseTask is not open-sourced. Did anyone manage to modify the value of actions for the custom usage? submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
  • Open

    [P] Markov chain with unequal sequence lengths
    I'm trying to build a simple Markov chain. I have data from therapy notes, where the therapist selects the overall topic of the session out a list of 27 possible topics. The problem is that not every therapist tags their session topic consistently, and all clients have a different number of sessions. I'm trying to build a simple Markov chain to get probabilities of topic transition between sessions, but as you can see, the data are complicated. I was wondering if anyone has encountered a situation where there were uneven sequence lengths per observation/case and how did you go about building a Markov chain in this case? Thanks! submitted by /u/sebelly [link] [comments]  ( 1 min )
    WACV 2023? [R]
    I don't see a website or any info for WACV 2023. Do you think it will still be hosted, and what would be the expected submission deadline for round 1? submitted by /u/avd4292 [link] [comments]
    [D] The state of reverse summarization/highly conditioned text gen
    Hi all, I’m looking to do a bit of research into highly conditioned text generation, specifically reverse summarization tasks. This type of task basically entails the reverse of what models like Pegasus aim to accomplish; instead of summarizing long text, the goal is to generate the long text from a summary. I’m curious if there’s been any work in this before or if there is an existing state of the art? It’s impossible to search for this online since summarization is so popular that everything I can find just entails going from long text to summary. I’d especially be curious about attempts at this task on conversational data (e.g. the reverse of SAMSum) or news articles (the reverse of CNN/DailyMail). submitted by /u/woodworksio [link] [comments]  ( 1 min )
    Good dataset for lasso regression [P]
    Hello everyone, I'm currently doing a memoir on accelerated proximal point algorithm and I need to prove the efficiency of my accelerated algorithm with a lasso regression. The thing is I need to have a dataset which has a lot of features ( to show the efficiency of lasso) like between 1000 and 2000 , do you know where I can download such dataset for Lasso Regression, that would be a huge help Thank You!!! submitted by /u/Jeffrosslostson [link] [comments]  ( 1 min )
    [D] An End-to-End MLOps Platform Implementation using Open-source Tooling
    submitted by /u/ponderinghydrogen [link] [comments]
    [R] Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN (CVPRW 2022)
    submitted by /u/Cautious-Ad1373 [link] [comments]
    [P] T-SNE to view and order your Spotify tracks
    submitted by /u/40wd [link] [comments]  ( 2 min )
    [P] I created a DALL·E Flow website
    submitted by /u/tomd_96 [link] [comments]  ( 1 min )
    [R][P] Thin-Plate Spline Motion Model for Image Animation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 2 min )
    [D] What are the issues with using TMLE/G comp/Double Robust estimators to interpret ML models with marginal effects?
    So TMLE is a way to do causal inference using ML models. It is described in this book https://tlverse.org/tlverse-handbook/tmle3.html Of course, the causality part comes from the domain assumptions and causal graph, without that its just regular statistical inference/estimation. Kevin Murphy’s Prob ML 2 in Ch 35 also describes the G computation procedure as well as Double Robust estimation and obtaining uncertainty. Briefly, G comp involves perturbing the variable of interest by a small epsilon in both directions, making predictions on both datasets, averaging and dividing by 2eps. If the variable is categorical then you just do this for each category. This amounts to Pearl’s backdoor adjustment formula. If the causal assumptions are satisfied, this estimate is causal, otherwise it is just some marginal effect. People say that ML models struggle with causality and interpretability and are black boxes, but what is the issue with the above approach? Using G comp and enough data, in theory I could just throw a black box at the problem and still obtain an interpretable average effect size for an exposure (x variable) of interest, and if my variable selection was done right-it is also causal. Furthermore, this approach avoids parametric assumptions that are there in traditional regression, which would invalidate the inference if not satisfied anyways. So why isn’t this new causal or marginal effect stuff used more? It seems too good to be true, its possible to obtain a CI and p value with these methods, yet they haven’t seemed to pick up much yet outside some academic papers. Is the weakness that there is more to interpretability than just a CI/p value/effect size? What are you looking for with it? submitted by /u/111llI0__-__0Ill111 [link] [comments]  ( 4 min )
    [P] Does it make sense to train a model twice on separate datasets?
    TLDR: If I use transfer learning to train a model on one general face dataset, does it make sense to then train it again on a more specific set of faces (a set more similar to my test set) or will the training of the second dataset just overwrite everything learnt from the more general dataset? So I have the task of using a CNN for facial recognition. So I am using it for the classification of faces to different classes of people, each individual person being a it's own separate class. The training data I am given is very limited - I only have one image for each class. I have 100 classes (so I have 100 images in total, one image of each person). The approach I am using is transfer learning of the GoogLenet architecture. However, instead of just training the googLenet on the images of the …  ( 3 min )
    LSTM/CNN architectures for time series forecasting[Discussion]
    I've been doing data engineering for the last year and a half and have just started going back into ML. I have a time series classification problem with a mix of time varying covariates and static covariates. In the past, I've tried different types of RNNS, CNNs, and even CNN-LSTM. However, I haven't kept up with the latest literature and so I feel a bit behind on what the typical ordering of the architecture looks like. And looking over a bunch of different examples, I don't see a recurring pattern with any of them. For example, in between CNN layers, they will do pooling + BN + Relu + Dropout. Then others will skip Dropout do relu before BN. And on top of all of that, I haven't seen almost any other forms of regularization behing applied besides the one's above and early stopping. I realize this is a broad question, but in general, assuming you are done with your preprocessing phase, what is the standard architectural ordering people do for LSTM and CNNs? Also want to confirm. Is the standard approach for situations with different covariate types to run the time varying stuff through LSTM/CNNs and then concat to the static covariates and then run that through feed forward layers? submitted by /u/Think-Culture-4740 [link] [comments]  ( 1 min )
  • Open

    AI News | Breast Cancer Detection Beats Humans | AI Learns Concepts Across Video, Audio & Text | AI Music Hit Song Detection
    submitted by /u/getrich_or_diemining [link] [comments]  ( 1 min )
    From Data Center Modeling to Edge Deployments with Monitoring: Domino Enterprise MLOps for Oil & Gas Machine Learning
    submitted by /u/Dracutela [link] [comments]
    9 Best Artificial Intelligence books for beginners to expert to read in 2022 -
    submitted by /u/maneesh123456 [link] [comments]
    I created a website for accessing DALL·E Flow
    I created this Streamlit website for Jina AI's awesome DALL·E Flow project. What do you think? submitted by /u/tomd_96 [link] [comments]
    Python library for deploying machine learning models
    submitted by /u/Illustrious_Row_9971 [link] [comments]
  • Open

    Mentally calculating the day of the week
    In my previous post I mentioned John Conway’s Doomsday rule for calculating the day of the week for any date. This method starts off very simple, but gets more complicated when you actually use it. This post will present an alternative method that’s easier to use in practice and can be described more succinctly. Here’s […] Mentally calculating the day of the week first appeared on John D. Cook.  ( 3 min )
  • Open

    Determine whether an image, from a given sequence of images, has changed considerably
    Problem Statement: You have a sequence of images for a room that contains three important components: sink, piping, walls, and fire alarm. An image is taken every week. That means you have an initial image taken on week 0 and an image I taken on week I. The room is undergoing construction and each week you have to determine whether the room has undergone sufficient change. This process must continue until the completion of work on the room. There is no restriction on the perspective from which the image would be taken. So, for example, the image from week A would be of the room taken while standing in the right doorway while the image taken from week A+1 could be taken while standing in the left doorway. You have to determine whether the three important components (sink, piping, walls, …  ( 2 min )
    Decimal to Binary Converter with an MLPNN
    Is it possible to implement a simple multi-layer perception neural network that can convert decimal to unsigned binary? If it is possible, what activation functions would I need and how deep and wide would my layers need to be? I've tried training it with 14 trials with 100% accuracy for 5-bit ranging from 0 to 30 with a variety of desired outputs, so lack of data is most likely not a problem. My current (and fairly arbritary) layer structure is: 1 input node 10 tanh nodes 10 tanh nodes 5 sigmoid nodes submitted by /u/EpicJoeR [link] [comments]  ( 1 min )
    is FANN good in 2022?
    i made a game a few years ago in C++ with FANN just to mess around.Now I'm revisiting the concept but I want to do a better job. FANN is perfectly fast enough- on my PC it can run tens of thousands of small networks forward at 50 frames per second. But it is unstable and tends to crash every minute or so while doing that. Unfortunately the library does its own memory management so I don't really know how to stabilize it. If I want a neural net library that's fast, simple, and stable in C++, is FANN the best choice or still even a good choice? submitted by /u/rambutang [link] [comments]  ( 1 min )
  • Open

    U-FNO -- An enhanced Fourier neural operator-based deep-learning model for multiphase flow. (arXiv:2109.03697v3 [physics.geo-ph] UPDATED)
    Numerical simulation of multiphase flow in porous media is essential for many geoscience applications. Machine learning models trained with numerical simulation data can provide a faster alternative to traditional simulators. Here we present U-FNO, a novel neural network architecture for solving multiphase flow problems with superior accuracy, speed, and data efficiency. U-FNO is designed based on the newly proposed Fourier neural operator (FNO), which has shown excellent performance in single-phase flows. We extend the FNO-based architecture to a highly complex CO2-water multiphase problem with wide ranges of permeability and porosity heterogeneity, anisotropy, reservoir conditions, injection configurations, flow rates, and multiphase flow properties. The U-FNO architecture is more accurate in gas saturation and pressure buildup predictions than the original FNO and a state-of-the-art convolutional neural network (CNN) benchmark. Meanwhile, it has superior data utilization efficiency, requiring only a third of the training data to achieve the equivalent accuracy as CNN. U-FNO provides superior performance in highly heterogeneous geological formations and critically important applications such as gas saturation and pressure buildup "fronts" determination. The trained model can serve as a general-purpose alternative to routine numerical simulations of 2D-radial CO2 injection problems with significant speed-ups than traditional simulators.
    An Explanation of In-context Learning as Implicit Bayesian Inference. (arXiv:2111.02080v5 [cs.CL] UPDATED)
    Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.
    Dangling-Aware Entity Alignment with Mixed High-Order Proximities. (arXiv:2205.02406v1 [cs.CL])
    We study dangling-aware entity alignment in knowledge graphs (KGs), which is an underexplored but important problem. As different KGs are naturally constructed by different sets of entities, a KG commonly contains some dangling entities that cannot find counterparts in other KGs. Therefore, dangling-aware entity alignment is more realistic than the conventional entity alignment where prior studies simply ignore dangling entities. We propose a framework using mixed high-order proximities on dangling-aware entity alignment. Our framework utilizes both the local high-order proximity in a nearest neighbor subgraph and the global high-order proximity in an embedding space for both dangling detection and entity alignment. Extensive experiments with two evaluation settings shows that our framework more precisely detects dangling entities, and better aligns matchable entities. Further investigations demonstrate that our framework can mitigate the hubness problem on dangling-aware entity alignment.
    Automated Imbalanced Classification via Layered Learning. (arXiv:2205.02553v1 [cs.LG])
    In this paper we address imbalanced binary classification (IBC) tasks. Applying resampling strategies to balance the class distribution of training instances is a common approach to tackle these problems. Many state-of-the-art methods find instances of interest close to the decision boundary to drive the resampling process. However, under-sampling the majority class may potentially lead to important information loss. Over-sampling also may increase the chance of overfitting by propagating the information contained in instances from the minority class. The main contribution of our work is a new method called ICLL for tackling IBC tasks which is not based on resampling training observations. Instead, ICLL follows a layered learning paradigm to model the data in two stages. In the first layer, ICLL learns to distinguish cases close to the decision boundary from cases which are clearly from the majority class, where this dichotomy is defined using a hierarchical clustering analysis. In the subsequent layer, we use instances close to the decision boundary and instances from the minority class to solve the original predictive task. A second contribution of our work is the automatic definition of the layers which comprise the layered learning strategy using a hierarchical clustering model. This is a relevant discovery as this process is usually performed manually according to domain knowledge. We carried out extensive experiments using 100 benchmark data sets. The results show that the proposed method leads to a better performance relatively to several state-of-the-art methods for IBC.
    Characterizing player's playing styles based on Player Vectors for each playing position in the Chinese Football Super League. (arXiv:2205.02731v1 [cs.LG])
    Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a recently adopted Player Vectors framework. Data of 960 matches from 2016-2019 CSL were used. Match ratings, and ten types of match events with the corresponding coordinates for all the lineup players whose on-pitch time exceeded 45 minutes were extracted. Players were first clustered into 8 positions. A player vector was constructed for each player in each match based on the Player Vectors using Nonnegative Matrix Factorization (NMF). Another NMF process was run on the player vectors to extract different types of playing styles. The resulting player vectors discovered 18 different playing styles in the CSL. Six performance indicators of each style were investigated to observe their contributions. In general, the playing styles of forwards and midfielders are in line with football performance evolution trends, while the styles of defenders should be reconsidered. Multifunctional playing styles were also found in high rated CSL players.
    A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning. (arXiv:2202.13403v2 [cs.CV] UPDATED)
    Large datasets as required for deep learning of lip reading do not exist in many languages. In this paper we present the dataset GLips (German Lips) consisting of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline. The format is similar to that of the English language LRW (Lip Reading in the Wild) dataset, with each video encoding one word of interest in a context of 1.16 seconds duration, which yields compatibility for studying transfer learning between both datasets. By training a deep neural network, we investigate whether lip reading has language-independent features, so that datasets of different languages can be used to improve lip reading models. We demonstrate learning from scratch and show that transfer learning from LRW to GLips and vice versa improves learning speed and performance, in particular for the validation set.
    RANG: A Residual-based Adaptive Node Generation Method for Physics-Informed Neural Networks. (arXiv:2205.01051v2 [cs.LG] UPDATED)
    Learning solutions of partial differential equations (PDEs) with Physics-Informed Neural Networks (PINNs) is an attractive alternative approach to traditional solvers due to its flexibility and ease of incorporating observed data. Despite the success of PINNs in accurately solving a wide variety of PDEs, the method still requires improvements in terms of computational efficiency. One possible improvement idea is to optimize the generation of training point sets. Residual-based adaptive sampling and quasi-uniform sampling approaches have been each applied to improve the training effects of PINNs, respectively. To benefit from both methods, we propose the Residual-based Adaptive Node Generation (RANG) approach for efficient training of PINNs, which is based on a variable density nodal distribution method for RBF-FD. The method is also enhanced by a memory mechanism to further improve training stability. We conduct experiments on three linear PDEs and three nonlinear PDEs with various node generation methods, through which the accuracy and efficiency of the proposed method compared to the predominant uniform sampling approach is verified numerically.  ( 2 min )
    Offline Vehicle Routing Problem with Online Bookings: A Novel Problem Formulation with Applications to Paratransit. (arXiv:2204.11992v2 [cs.AI] UPDATED)
    Vehicle routing problems (VRPs) can be divided into two major categories: offline VRPs, which consider a given set of trip requests to be served, and online VRPs, which consider requests as they arrive in real-time. Based on discussions with public transit agencies, we identify a real-world problem that is not addressed by existing formulations: booking trips with flexible pickup windows (e.g., 3 hours) in advance (e.g., the day before) and confirming tight pickup windows (e.g., 30 minutes) at the time of booking. Such a service model is often required in paratransit service settings, where passengers typically book trips for the next day over the phone. To address this gap between offline and online problems, we introduce a novel formulation, the offline vehicle routing problem with online bookings. This problem is very challenging computationally since it faces the complexity of considering large sets of requests -- similar to offline VRPs -- but must abide by strict constraints on running time -- similar to online VRPs. To solve this problem, we propose a novel computational approach, which combines an anytime algorithm with a learning-based policy for real-time decisions. Based on a paratransit dataset obtained from the public transit agency of Chattanooga, TN, we demonstrate that our novel formulation and computational approach lead to significantly better outcomes in this setting than existing algorithms.
    Subverting Fair Image Search with Generative Adversarial Perturbations. (arXiv:2205.02414v1 [cs.LG])
    In this work we explore the intersection fairness and robustness in the context of ranking: \textit{when a ranking model has been calibrated to achieve some definition of fairness, is it possible for an external adversary to make the ranking model behave unfairly without having access to the model or training data?} To investigate this question, we present a case study in which we develop and then attack a state-of-the-art, fairness-aware image search engine using images that have been maliciously modified using a Generative Adversarial Perturbation (GAP) model. These perturbations attempt to cause the fair re-ranking algorithm to unfairly boost the rank of images containing people from an adversary-selected subpopulation. We present results from extensive experiments demonstrating that our attacks can successfully confer significant unfair advantage to people from the majority class relative to fairly-ranked baseline search results. We demonstrate that our attacks are robust across a number of variables, that they have close to zero impact on the relevance of search results, and that they succeed under a strict threat model. Our findings highlight the danger of deploying fair machine learning algorithms in-the-wild when (1) the data necessary to achieve fairness may be adversarially manipulated, and (2) the models themselves are not robust against attacks.
    Tracking the risk of a deployed model and detecting harmful distribution shifts. (arXiv:2110.06177v4 [stat.ML] UPDATED)
    When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model retraining) unnecessary. While several works have developed tests for distribution shifts, these typically either use non-sequential methods, or detect arbitrary shifts (benign or harmful), or both. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate. In this work, we design simple sequential tools for testing if the difference between source (training) and target (test) distributions leads to a significant increase in a risk function of interest, like accuracy or calibration. Recent advances in constructing time-uniform confidence sequences allow efficient aggregation of statistical evidence accumulated during the tracking process. The designed framework is applicable in settings where (some) true labels are revealed after the prediction is performed, or when batches of labels become available in a delayed fashion. We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets.
    Efficient and Convergent Federated Learning. (arXiv:2205.01438v2 [cs.LG] UPDATED)
    Federated learning has shown its advances over the last few years but is facing many challenges, such as how algorithms save communication resources, how they reduce computational costs, and whether they converge. To address these issues, this paper proposes a new federated learning algorithm (FedGiA) that combines the gradient descent and the inexact alternating direction method of multipliers. It is shown that FedGiA is computation and communication-efficient and convergent linearly under mild conditions.
    Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models. (arXiv:2110.14993v2 [cs.LG] UPDATED)
    We study prediction of future outcomes with supervised models that use privileged information during learning. The privileged information comprises samples of time series observed between the baseline time of prediction and the future outcome; this information is only available at training time which differs from the traditional supervised learning. Our question is when using this privileged data leads to more sample-efficient learning of models that use only baseline data for predictions at test time. We give an algorithm for this setting and prove that when the time series are drawn from a non-stationary Gaussian-linear dynamical system of fixed horizon, learning with privileged information is more efficient than learning without it. On synthetic data, we test the limits of our algorithm and theory, both when our assumptions hold and when they are violated. On three diverse real-world datasets, we show that our approach is generally preferable to classical learning, particularly when data is scarce. Finally, we relate our estimator to a distillation approach both theoretically and empirically.  ( 2 min )
    GCN-Transformer for short-term passenger flow prediction on holidays in urban rail transit systems. (arXiv:2203.00007v2 [cs.LG] UPDATED)
    The short-term passenger flow prediction of the urban rail transit system is of great significance for traffic operation and management. The emerging deep learning-based models provide effective methods to improve prediction accuracy. However, most of the existing models mainly predict the passenger flow on general weekdays, while few studies focus on predicting the holiday passenger flow, which can provide more significant information for operators because congestions or accidents generally occur on holidays. To this end, we propose a deep learning-based model named GCN-Transformer comprising graph conventional neural network (GCN) and Transformer for short-term passenger flow prediction on holidays. The GCN is applied to extract the spatial features of passenger flows and the Transformer is applied to extract the temporal features of passenger flows. Moreover, in addition to the historical passenger flow data, social media data are also incorporated into the prediction model, which has been proven to have a potential correlation with the fluctuation of passenger flow. The GCN-Transformer is tested on two large-scale real-world datasets from Nanning, China during the New Year holiday and is compared with several conventional prediction models. Results demonstrate its better robustness and advantages among baseline methods, which provides overwhelming support for practical applications of short-term passenger flow prediction on holidays
    Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks. (arXiv:2202.05258v2 [cs.LG] UPDATED)
    We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ. Prior work hinted at the impossibility of our result: Vempala and Wilmes showed that general SQ lower bounds cannot apply to any real-valued family of functions that satisfies a simple non-degeneracy condition. To circumvent their result, we refine a lifting procedure due to Daniely and Vardi that reduces Boolean PAC learning problems to Gaussian ones. We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction. As such, we also prove new cryptographic hardness results for PAC learning two-hidden-layer ReLU networks, as well as new lower bounds for learning constant-depth ReLU networks from label queries.
    Fitting an immersed submanifold to data via Sussmann's orbit theorem. (arXiv:2204.01119v2 [cs.LG] UPDATED)
    This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples. The reconstruction mapping from the ambient space to the desired submanifold is implemented as a composition of an encoder that maps each point to a tuple of (positive or negative) times and a decoder given by a composition of flows along finitely many vector fields starting from a fixed initial point. The encoder supplies the times for the flows. The encoder-decoder map is obtained by empirical risk minimization, and a high-probability bound is given on the excess risk relative to the minimum expected reconstruction error over a given class of encoder-decoder maps. The proposed approach makes fundamental use of Sussmann's orbit theorem, which guarantees that the image of the reconstruction map is indeed contained in an immersed submanifold.
    Negative Evidence Matters in Interpretable Histology Image Classification. (arXiv:2201.02445v3 [eess.IV] UPDATED)
    Using only global image-class labels, weakly-supervised learning methods, such as class activation mapping, allow training CNNs to jointly classify an image, and locate regions of interest associated with the predicted class. However, without any guidance at the pixel level, such methods may yield inaccurate regions. This problem is known to be more challenging with histology images than with natural ones, since objects are less salient, structures have more variations, and foreground and background regions have stronger similarities. Therefore, computer vision methods for visual interpretation of CNNs may not directly apply. In this paper, a simple yet efficient method based on a composite loss is proposed to learn information from the fully negative samples (i.e., samples without positive regions), and thereby reduce false positives/negatives. Our new loss function contains two complementary terms: the first exploits positive evidence collected from the CNN classifier, while the second leverages the fully negative samples from training data. In particular, a pre-trained CNN is equipped with a decoder that allows refining the regions of interest. The CNN is exploited to collect both positive and negative evidence at the pixel level to train the decoder. Our method called NEGEV benefits from the fully negative samples that naturally occur in the data, without any additional supervision signals beyond image-class labels. Extensive experiments show that our proposed method can substantial outperform related state-of-art methods on GlaS (public benchmark for colon cancer), and Camelyon16 (patch-based benchmark for breast cancer using three different backbones). Our results highlight the benefits of using both positive and negative evidence, the first obtained from a classifier, and the other naturally available in datasets.  ( 3 min )
    Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study. (arXiv:2204.12037v3 [cs.CV] UPDATED)
    Spatial-temporal representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal data in big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks an unified guidance and analysis about why modern spatial-temporal representation learning methods are easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for spatial-temporal representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some primary challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in spatial-temporal representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable spatial-temporal representation learning and related real-world applications more efficiently.
    OPT: Open Pre-trained Transformer Language Models. (arXiv:2205.01068v3 [cs.CL] UPDATED)
    Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
    ISDE : Independence Structure Density Estimation. (arXiv:2203.09783v2 [cs.LG] UPDATED)
    In this paper, we propose ISDE (Independence Structure Density Estimation), an algorithm designed to estimate a multivariate density under Kullback-Leibler loss and the Independence Structure (IS) model. IS tackles the curse of dimensionality by separating features into independent groups. We explain the construction of ISDE and present some experiments to show its performance on synthetic and real-world data. Performance is measured quantitatively by comparing empirical $\log$-likelihood with other density estimation methods and qualitatively by analyzing outputted partitions of variables. We also provide information about complexity and running time.
    Hybrid ISTA: Unfolding ISTA With Convergence Guarantees Using Free-Form Deep Neural Networks. (arXiv:2204.11640v2 [cs.CV] UPDATED)
    It is promising to solve linear inverse problems by unfolding iterative algorithms (e.g., iterative shrinkage thresholding algorithm (ISTA)) as deep neural networks (DNNs) with learnable parameters. However, existing ISTA-based unfolded algorithms restrict the network architectures for iterative updates with the partial weight coupling structure to guarantee convergence. In this paper, we propose hybrid ISTA to unfold ISTA with both pre-computed and learned parameters by incorporating free-form DNNs (i.e., DNNs with arbitrary feasible and reasonable network architectures), while ensuring theoretical convergence. We first develop HCISTA to improve the efficiency and flexibility of classical ISTA (with pre-computed parameters) without compromising the convergence rate in theory. Furthermore, the DNN-based hybrid algorithm is generalized to popular variants of learned ISTA, dubbed HLISTA, to enable a free architecture of learned parameters with a guarantee of linear convergence. To our best knowledge, this paper is the first to provide a convergence-provable framework that enables free-form DNNs in ISTA-based unfolded algorithms. This framework is general to endow arbitrary DNNs for solving linear inverse problems with convergence guarantees. Extensive experiments demonstrate that hybrid ISTA can reduce the reconstruction error with an improved convergence rate in the tasks of sparse recovery and compressive sensing.  ( 2 min )
    Approximate exploitability: Learning a best response in large games. (arXiv:2004.09677v4 [cs.LG] UPDATED)
    Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZero-based agents.  ( 2 min )
    Smooth-Swap: A Simple Enhancement for Face-Swapping with Smoothness. (arXiv:2112.05907v2 [cs.CV] UPDATED)
    Face-swapping models have been drawing attention for their compelling generation quality, but their complex architectures and loss functions often require careful tuning for successful training. We propose a new face-swapping model called `Smooth-Swap', which excludes complex handcrafted designs and allows fast and stable training. The main idea of Smooth-Swap is to build smooth identity embedding that can provide stable gradients for identity change. Unlike the one used in previous models trained for a purely discriminative task, the proposed embedding is trained with a supervised contrastive loss promoting a smoother space. With improved smoothness, Smooth-Swap suffices to be composed of a generic U-Net-based generator and three basic loss functions, a far simpler design compared with the previous models. Extensive experiments on face-swapping benchmarks (FFHQ, FaceForensics++) and face images in the wild show that our model is also quantitatively and qualitatively comparable or even superior to the existing methods.
    Mode Reduction for Markov Jump Systems. (arXiv:2205.02697v1 [eess.SY])
    Switched systems are capable of modeling processes with underlying dynamics that may change abruptly over time. To achieve accurate modeling in practice, one may need a large number of modes, but this may in turn increase the model complexity drastically. Existing work on reducing system complexity mainly considers state space reduction, yet reducing the number of modes is less studied. In this work, we consider Markov jump linear systems (MJSs), a special class of switched systems where the active mode switches according to a Markov chain, and several issues associated with its mode complexity. Specifically, inspired by clustering techniques from unsupervised learning, we are able to construct a reduced MJS with fewer modes that approximates well the original MJS under various metrics. Furthermore, both theoretically and empirically, we show how one can use the reduced MJS to analyze stability and design controllers with significant reduction in computational cost while achieving guaranteed accuracy.  ( 2 min )
    Is Pessimism Provably Efficient for Offline RL?. (arXiv:2012.15085v3 [cs.LG] UPDATED)
    We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the "best effort" among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the "irrelevant" trajectories that are less covered by the dataset and not informative for the optimal policy.  ( 2 min )
    WPPNets and WPPFlows: The Power of Wasserstein Patch Priors for Superresolution. (arXiv:2201.08157v2 [cs.CV] UPDATED)
    Exploiting image patches instead of whole images have proved to be a powerful approach to tackle various problems in image processing. Recently, Wasserstein patch priors (WPP), which are based on the comparison of the patch distributions of the unknown image and a reference image, were successfully used as data-driven regularizers in the variational formulation of superresolution. However, for each input image, this approach requires the solution of a non-convex minimization problem which is computationally costly. In this paper, we propose to learn two kinds of neural networks in an unsupervised way based on WPP loss functions. First, we show how convolutional neural networks (CNNs) can be incorporated. Once the network, called WPPNet, is learned, it can very efficiently applied to any input image. Second, we incorporate conditional normalizing flows to provide a tool for uncertainty quantification. Numerical examples demonstrate the very good performance of WPPNets for superresolution in various image classes even if the forward operator is known only approximately.
    Object discovery and representation networks. (arXiv:2203.08777v2 [cs.CV] UPDATED)
    The promise of self-supervised learning (SSL) is to leverage large amounts of unlabeled data to solve complex tasks. While there has been excellent progress with simple, image-level learning, recent methods have shown the advantage of including knowledge of image structure. However, by introducing hand-crafted image segmentations to define regions of interest, or specialized augmentation strategies, these methods sacrifice the simplicity and generality that makes SSL so powerful. Instead, we propose a self-supervised learning paradigm that discovers this image structure by itself. Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision. The resulting learning paradigm is simpler, less brittle, and more general, and achieves state-of-the-art transfer learning results for object detection and instance segmentation on COCO, and semantic segmentation on PASCAL and Cityscapes, while strongly surpassing supervised pre-training for video segmentation on DAVIS.
    Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks. (arXiv:2102.11010v3 [cs.LG] UPDATED)
    We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. Saliency interpretations of deterministic Neural Networks are remarkably brittle even when the attacks fail, i.e. for attacks that do not change the classification label. We empirically show that interpretations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations of the inputs and even under direct attacks to the explanations. By leveraging recent results, we also provide a theoretical explanation of this result in terms of the geometry of the data manifold. Additionally, we discuss the stability of the interpretations of high level representations of the inputs in the internal layers of a Network. Our results demonstrate that Bayesian methods, in addition to being more robust to adversarial attacks, have the potential to provide more stable and interpretable assessments of Neural Network predictions.
    StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets. (arXiv:2202.00273v2 [cs.LG] UPDATED)
    Computer graphics has experienced a recent surge of data-centric approaches for photorealistic and controllable content creation. StyleGAN in particular sets new standards for generative modeling regarding image quality and controllability. However, StyleGAN's performance severely degrades on large unstructured datasets such as ImageNet. StyleGAN was designed for controllability; hence, prior works suspect its restrictive design to be unsuitable for diverse datasets. In contrast, we find the main limiting factor to be the current training strategy. Following the recently introduced Projected GAN paradigm, we leverage powerful neural network priors and a progressive growing strategy to successfully train the latest StyleGAN3 generator on ImageNet. Our final model, StyleGAN-XL, sets a new state-of-the-art on large-scale image synthesis and is the first to generate images at a resolution of $1024^2$ at such a dataset scale. We demonstrate that this model can invert and edit images beyond the narrow domain of portraits or specific object classes.
    A Change Dynamic Model for the Online Detection of Gradual Change. (arXiv:2205.01054v3 [stat.ML] UPDATED)
    Changes in the statistical properties of a stochastic process are typically assumed to occur via change-points, which demark instantaneous moments of complete and total change in process behavior. In cases where these transitions occur gradually, this assumption can result in a reduced ability to properly identify and respond to process change. With this observation in mind, we introduce a novel change-dynamic model for the online detection of gradual change in a Bayesian framework, in which change-points are used within a hierarchical model to indicate moments of gradual change onset or termination. We apply this model to synthetic data and EEG readings drawn during epileptic seizure, where we find our change-dynamic model can enable faster and more accurate identification of gradual change than traditional change-point models allow.
    Detection of Large Vessel Occlusions using Deep Learning by Deforming Vessel Tree Segmentations. (arXiv:2112.01797v3 [eess.IV] UPDATED)
    Computed Tomography Angiography is a key modality providing insights into the cerebrovascular vessel tree that are crucial for the diagnosis and treatment of ischemic strokes, in particular in cases of large vessel occlusions (LVO). Thus, the clinical workflow greatly benefits from an automated detection of patients suffering from LVOs. This work uses convolutional neural networks for case-level classification trained with elastic deformation of the vessel tree segmentation masks to artificially augment training data. Using only masks as the input to our model uniquely allows us to apply such deformations much more aggressively than one could with conventional image volumes while retaining sample realism. The neural network classifies the presence of an LVO and the affected hemisphere. In a 5-fold cross validated ablation study, we demonstrate that the use of the suggested augmentation enables us to train robust models even from few data sets. Training the EfficientNetB1 architecture on 100 data sets, the proposed augmentation scheme was able to raise the ROC AUC to 0.85 from a baseline value of 0.56 using no augmentation. The best performance was achieved using a 3D-DenseNet yielding an AUC of 0.87. The augmentation had positive impact in classification of the affected hemisphere as well, where the 3D-DenseNet reached an AUC of 0.93 on both sides.
    VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers. (arXiv:2203.17247v2 [cs.CV] UPDATED)
    Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner workings, as unraveling these black-boxes will lead to more capable and trustworthy models. To contribute to this quest, we propose VL-InterpreT, which provides novel interactive visualizations for interpreting the attentions and hidden representations in multimodal transformers. VL-InterpreT is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool.
    Local Latin Hypercube Refinement for Multi-objective Design Uncertainty Optimization. (arXiv:2108.08890v2 [stat.ML] UPDATED)
    Optimizing the reliability and the robustness of a design is important but often unaffordable due to high sample requirements. Surrogate models based on statistical and machine learning methods are used to increase the sample efficiency. However, for higher dimensional or multi-modal systems, surrogate models may also require a large amount of samples to achieve good results. We propose a sequential sampling strategy for the surrogate based solution of multi-objective reliability based robust design optimization problems. Proposed local Latin hypercube refinement (LoLHR) strategy is model-agnostic and can be combined with any surrogate model because there is no free lunch but possibly a budget one. The proposed method is compared to stationary sampling as well as other proposed strategies from the literature. Gaussian process and support vector regression are both used as surrogate models. Empirical evidence is presented, showing that LoLHR achieves on average better results compared to other surrogate based strategies on the tested examples.
    LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning. (arXiv:2205.02561v1 [cs.LG])
    Cooperative multi-agent reinforcement learning (MARL) has made prominent progress in recent years. For training efficiency and scalability, most of the MARL algorithms make all agents share the same policy or value network. However, many complex multi-agent tasks require agents with a variety of specific abilities to handle different subtasks. Sharing parameters indiscriminately may lead to similar behaviors across all agents, which will limit the exploration efficiency and be detrimental to the final performance. To balance the training complexity and the diversity of agents' behaviors, we propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL. Specifically, we first introduce a subtask encoder that constructs a vector representation for each subtask according to its identity. To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy, which can dynamically group agents with similar abilities into the same subtask. Then, we condition the subtask policy on its representation and agents dealing with the same subtask share their experiences to train the subtask policy. We further introduce two regularizers to increase the representation difference between subtasks and avoid agents changing subtasks frequently to stabilize training, respectively. Empirical results show that LDSA learns reasonable and effective subtask assignment for better collaboration and significantly improves the learning performance on the challenging StarCraft II micromanagement benchmark.  ( 2 min )
    One Size Does Not Fit All: The Case for Personalised Word Complexity Models. (arXiv:2205.02564v1 [cs.CL])
    Complex Word Identification (CWI) aims to detect words within a text that a reader may find difficult to understand. It has been shown that CWI systems can improve text simplification, readability prediction and vocabulary acquisition modelling. However, the difficulty of a word is a highly idiosyncratic notion that depends on a reader's first language, proficiency and reading experience. In this paper, we show that personal models are best when predicting word complexity for individual readers. We use a novel active learning framework that allows models to be tailored to individuals and release a dataset of complexity annotations and models as a benchmark for further research.  ( 2 min )
    Quantifying Adaptability in Pre-trained Language Models with 500 Tasks. (arXiv:2112.03204v2 [cs.CL] UPDATED)
    When a neural language model (LM) is adapted to perform a new task, what aspects of the task predict the eventual performance of the model? In NLP, systematic features of LM generalization to individual examples are well characterized, but systematic aspects of LM adaptability to new tasks are not nearly as well understood. We present a large-scale empirical study of the features and limits of LM adaptability using a new benchmark, TaskBench500, built from 500 procedurally generated sequence modeling tasks. These tasks combine core aspects of language processing, including lexical semantics, sequence processing, memorization, logical reasoning, and world knowledge. Using TaskBench500, we evaluate three facets of adaptability, finding that: (1) adaptation procedures differ dramatically in their ability to memorize small datasets; (2) within a subset of task types, adaptation procedures exhibit compositional adaptability to complex tasks; and (3) failure to match training label distributions is explained by mismatches in the intrinsic difficulty of predicting individual labels. Our experiments show that adaptability to new tasks, like generalization to new examples, can be systematically described and understood, and we conclude with a discussion of additional aspects of adaptability that could be studied using the new benchmark.
    REDS: Rule Extraction for Discovering Scenarios. (arXiv:1910.01713v2 [cs.LG] UPDATED)
    Scenario discovery is the process of finding areas of interest, known as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions, i.e., inputs of the simulation model, where the system is unstable. Subgroup discovery methods are commonly used for scenario discovery. They find scenarios in the form of hyperboxes, which are easy to comprehend. Given a computational budget, results tend to get worse as the number of inputs of the simulation model and the cost of simulations increase. We propose a new procedure for scenario discovery from few simulations, dubbed REDS. A key ingredient is using an intermediate machine learning model to label data for subsequent use by conventional subgroup discovery methods. We provide statistical arguments why this is an improvement. In our experiments, REDS reduces the number of simulations required by 50--75\% on average, depending on the quality measure. It is also useful as a semi-supervised subgroup discovery method and for discovering better scenarios from third-party data, when a simulation model is not available.  ( 2 min )
    Dual Octree Graph Networks for Learning Adaptive Volumetric Shape Representations. (arXiv:2205.02825v1 [cs.CV])
    We present an adaptive deep representation of volumetric fields of 3D shapes and an efficient approach to learn this deep representation for high-quality 3D shape reconstruction and auto-encoding. Our method encodes the volumetric field of a 3D shape with an adaptive feature volume organized by an octree and applies a compact multilayer perceptron network for mapping the features to the field value at each 3D position. An encoder-decoder network is designed to learn the adaptive feature volume based on the graph convolutions over the dual graph of octree nodes. The core of our network is a new graph convolution operator defined over a regular grid of features fused from irregular neighboring octree nodes at different levels, which not only reduces the computational and memory cost of the convolutions over irregular neighboring octree nodes, but also improves the performance of feature learning. Our method effectively encodes shape details, enables fast 3D shape reconstruction, and exhibits good generality for modeling 3D shapes out of training categories. We evaluate our method on a set of reconstruction tasks of 3D shapes and scenes and validate its superiority over other existing approaches. Our code, data, and trained models are available at https://wang-ps.github.io/dualocnn.
    FedSPLIT: One-Shot Federated Recommendation System Based on Non-negative Joint Matrix Factorization and Knowledge Distillation. (arXiv:2205.02359v1 [cs.LG])
    Non-negative matrix factorization (NMF) with missing-value completion is a well-known effective Collaborative Filtering (CF) method used to provide personalized user recommendations. However, traditional CF relies on the privacy-invasive collection of users' explicit and implicit feedback to build a central recommender model. One-shot federated learning has recently emerged as a method to mitigate the privacy problem while addressing the traditional communication bottleneck of federated learning. In this paper, we present the first unsupervised one-shot federated CF implementation, named FedSPLIT, based on NMF joint factorization. In our solution, the clients first apply local CF in-parallel to build distinct client-specific recommenders. Then, the privacy-preserving local item patterns and biases from each client are shared with the processor to perform joint factorization in order to extract the global item patterns. Extracted patterns are then aggregated to each client to build the local models via knowledge distillation. In our experiments, we demonstrate the feasibility of our approach with standard recommendation datasets. FedSPLIT can obtain similar results than the state of the art (and even outperform it in certain situations) with a substantial decrease in the number of communications.
    Visual Domain Adaptation for Monocular Depth Estimation on Resource-Constrained Hardware. (arXiv:2108.02671v2 [cs.CV] UPDATED)
    Real-world perception systems in many cases build on hardware with limited resources to adhere to cost and power limitations of their carrying system. Deploying deep neural networks on resource-constrained hardware became possible with model compression techniques, as well as efficient and hardware-aware architecture design. However, model adaptation is additionally required due to the diverse operation environments. In this work, we address the problem of training deep neural networks on resource-constrained hardware in the context of visual domain adaptation. We select the task of monocular depth estimation where our goal is to transform a pre-trained model to the target's domain data. While the source domain includes labels, we assume an unlabelled target domain, as it happens in real-world applications. Then, we present an adversarial learning approach that is adapted for training on the device with limited resources. Since visual domain adaptation, i.e. neural network training, has not been previously explored for resource-constrained hardware, we present the first feasibility study for image-based depth estimation. Our experiments show that visual domain adaptation is relevant only for efficient network architectures and training sets at the order of a few hundred samples. Models and code are publicly available.
    Based-CE white-box adversarial attack will not work using super-fitting. (arXiv:2205.02741v1 [cs.LG])
    Deep Neural Networks (DNN) are widely used in various fields due to their powerful performance, but recent studies have shown that deep learning models are vulnerable to adversarial attacks-by adding a slight perturbation to the input, the model will get wrong results. It is especially dangerous for some systems with high security requirements, so this paper proposes a new defense method by using the model super-fitting status. Model's adversarial robustness (i.e., the accuracry under adversarial attack) has been greatly improved in this status. This paper mathematically proves the effectiveness of super-fitting, and proposes a method to make the model reach this status quickly-minimaze unrelated categories scores (MUCS). Theoretically, super-fitting can resist any existing (even future) Based on CE white-box adversarial attack. In addition, this paper uses a variety of powerful attack algorithms to evaluate the adversarial robustness of super-fitting and other nearly 50 defense models from recent conferences. The experimental results show that super-fitting method in this paper can make the trained model obtain the highest adversarial performance robustness.
    Uncertainty Minimization for Personalized Federated Semi-Supervised Learning. (arXiv:2205.02438v1 [cs.LG])
    Since federated learning (FL) has been introduced as a decentralized learning technique with privacy preservation, statistical heterogeneity of distributed data stays the main obstacle to achieve robust performance and stable convergence in FL applications. Model personalization methods have been studied to overcome this problem. However, existing approaches are mainly under the prerequisite of fully labeled data, which is unrealistic in practice due to the requirement of expertise. The primary issue caused by partial-labeled condition is that, clients with deficient labeled data can suffer from unfair performance gain because they lack adequate insights of local distribution to customize the global model. To tackle this problem, 1) we propose a novel personalized semi-supervised learning paradigm which allows partial-labeled or unlabeled clients to seek labeling assistance from data-related clients (helper agents), thus to enhance their perception of local data; 2) based on this paradigm, we design an uncertainty-based data-relation metric to ensure that selected helpers can provide trustworthy pseudo labels instead of misleading the local training; 3) to mitigate the network overload introduced by helper searching, we further develop a helper selection protocol to achieve efficient communication with negligible performance sacrifice. Experiments show that our proposed method can obtain superior performance and more stable convergence than other related works with partial labeled data, especially in highly heterogeneous setting.  ( 2 min )
    DeepBayes -- an estimator for parameter estimation in stochastic nonlinear dynamical models. (arXiv:2205.02264v1 [stat.ML])
    Stochastic nonlinear dynamical systems are ubiquitous in modern, real-world applications. Yet, estimating the unknown parameters of stochastic, nonlinear dynamical models remains a challenging problem. The majority of existing methods employ maximum likelihood or Bayesian estimation. However, these methods suffer from some limitations, most notably the substantial computational time for inference coupled with limited flexibility in application. In this work, we propose DeepBayes estimators that leverage the power of deep recurrent neural networks in learning an estimator. The method consists of first training a recurrent neural network to minimize the mean-squared estimation error over a set of synthetically generated data using models drawn from the model set of interest. The a priori trained estimator can then be used directly for inference by evaluating the network with the estimation data. The deep recurrent neural network architectures can be trained offline and ensure significant time savings during inference. We experiment with two popular recurrent neural networks -- long short term memory network (LSTM) and gated recurrent unit (GRU). We demonstrate the applicability of our proposed method on different example models and perform detailed comparisons with state-of-the-art approaches. We also provide a study on a real-world nonlinear benchmark problem. The experimental evaluations show that the proposed approach is asymptotically as good as the Bayes estimator.  ( 2 min )
    Generative methods for sampling transition paths in molecular dynamics. (arXiv:2205.02818v1 [stat.ML])
    Molecular systems often remain trapped for long times around some local minimum of the potential energy function, before switching to another one -- a behavior known as metastability. Simulating transition paths linking one metastable state to another one is difficult by direct numerical methods. In view of the promises of machine learning techniques, we explore in this work two approaches to more efficiently generate transition paths: sampling methods based on generative models such as variational autoencoders, and importance sampling methods based on reinforcement learning.  ( 2 min )
    Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning. (arXiv:2205.02450v1 [cs.LG])
    Dynamic mechanism design has garnered significant attention from both computer scientists and economists in recent years. By allowing agents to interact with the seller over multiple rounds, where agents' reward functions may change with time and are state dependent, the framework is able to model a rich class of real world problems. In these works, the interaction between agents and sellers are often assumed to follow a Markov Decision Process (MDP). We focus on the setting where the reward and transition functions of such an MDP are not known a priori, and we are attempting to recover the optimal mechanism using an a priori collected data set. In the setting where the function approximation is employed to handle large state spaces, with only mild assumptions on the expressiveness of the function class, we are able to design a dynamic mechanism using offline reinforcement learning algorithms. Moreover, learned mechanisms approximately have three key desiderata: efficiency, individual rationality, and truthfulness. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set. To the best of our knowledge, our work provides the first offline RL algorithm for dynamic mechanism design without assuming uniform coverage.  ( 2 min )
    Spiking Graph Convolutional Networks. (arXiv:2205.02767v1 [cs.LG])
    Graph Convolutional Networks (GCNs) achieve an impressive performance due to the remarkable representation ability in learning the graph information. However, GCNs, when implemented on a deep network, require expensive computation power, making them difficult to be deployed on battery-powered devices. In contrast, Spiking Neural Networks (SNNs), which perform a bio-fidelity inference process, offer an energy-efficient neural architecture. In this work, we propose SpikingGCN, an end-to-end framework that aims to integrate the embedding of GCNs with the biofidelity characteristics of SNNs. The original graph data are encoded into spike trains based on the incorporation of graph convolution. We further model biological information processing by utilizing a fully connected layer combined with neuron nodes. In a wide range of scenarios (e.g. citation networks, image graph classification, and recommender systems), our experimental results show that the proposed method could gain competitive performance against state-of-the-art approaches. Furthermore, we show that SpikingGCN on a neuromorphic chip can bring a clear advantage of energy efficiency into graph data analysis, which demonstrates its great potential to construct environment-friendly machine learning models.  ( 2 min )
    Cognitive Radio Resource Scheduling using Multi agent QLearning for LTE. (arXiv:2205.02765v1 [cs.NI])
    In this paper, we propose, implement, and test two novel downlink LTE scheduling algorithms. The implementation and testing of these algorithms were in Matlab, and they are based on the use of Reinforcement Learning, more specifically, the Qlearning technique for scheduling two types of users. The first algorithm is called a Collaborative scheduling algorithm, and the second algorithm is called a Competitive scheduling algorithm. The first type of the scheduled users is the Primary Users, and they are the licensed subscribers that pay for their service. The second type of the scheduled users is the Secondary Users, and they could be unlicensed subscribers that dont pay for their service, device to device communications, or sensors. Each user whether it is a primary or secondary is considered as an agent. In the Collaborative scheduling algorithm, the primary user agents will collaborate in order to make a joint scheduling decision about allocating the resource blocks to each one of them, then the secondary user agents will compete among themselves to use the remaining resource blocks. In the Competitive scheduling algorithm, the primary user agents will compete among themselves over the available resources, then the secondary user agents will compete among themselves over the remaining resources. Experimental results show that both scheduling algorithms converged to almost ninety percent utilization of the spectrum, and provided fair shares of the spectrum among users.  ( 2 min )
    Textless Speech-to-Speech Translation on Real Data. (arXiv:2112.08352v2 [cs.CL] UPDATED)
    We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents, while preserving the lexical content. With only 10 minutes of paired data for speech normalization, we obtain on average 3.2 BLEU gain when training the S2ST model on the VoxPopuli S2ST dataset, compared to a baseline trained on un-normalized speech target. We also incorporate automatically mined S2ST data and show an additional 2.0 BLEU gain. To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs. Audio samples are available at https://facebookresearch.github.io/speech_translation/textless_s2st_real_data/index.html .  ( 2 min )
    Communication-Efficient Adaptive Federated Learning. (arXiv:2205.02719v1 [cs.LG])
    Federated learning is a machine learning training paradigm that enables clients to jointly train models without sharing their own localized data. However, the implementation of federated learning in practice still faces numerous challenges, such as the large communication overhead due to the repetitive server-client synchronization and the lack of adaptivity by SGD-based model updates. Despite that various methods have been proposed for reducing the communication cost by gradient compression or quantization, and the federated versions of adaptive optimizers such as FedAdam are proposed to add more adaptivity, the current federated learning framework still cannot solve the aforementioned challenges all at once. In this paper, we propose a novel communication-efficient adaptive federated learning method (FedCAMS) with theoretical convergence guarantees. We show that in the nonconvex stochastic optimization setting, our proposed FedCAMS achieves the same convergence rate of $O(\frac{1}{\sqrt{TKm}})$ as its non-compressed counterparts. Extensive experiments on various benchmarks verify our theoretical analysis.  ( 2 min )
    CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting. (arXiv:2202.01575v3 [cs.LG] UPDATED)
    Deep learning has been actively studied for time series forecasting, and the mainstream paradigm is based on the end-to-end training of neural network architectures, ranging from classical LSTM/RNNs to more recent TCNs and Transformers. Motivated by the recent success of representation learning in computer vision and natural language processing, we argue that a more promising paradigm for time series forecasting, is to first learn disentangled feature representations, followed by a simple regression fine-tuning step -- we justify such a paradigm from a causal perspective. Following this principle, we propose a new time series representation learning framework for time series forecasting named CoST, which applies contrastive learning methods to learn disentangled seasonal-trend representations. CoST comprises both time domain and frequency domain contrastive losses to learn discriminative trend and seasonal representations, respectively. Extensive experiments on real-world datasets show that CoST consistently outperforms the state-of-the-art methods by a considerable margin, achieving a 21.3% improvement in MSE on multivariate benchmarks. It is also robust to various choices of backbone encoders, as well as downstream regressors. Code is available at https://github.com/salesforce/CoST.  ( 2 min )
    Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent DAGs with Applications. (arXiv:2205.02654v1 [cs.LG])
    Counting and sampling directed acyclic graphs from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper we show that these tasks can be performed in polynomial time, solving a long-standing open problem in this area. Our algorithms are effective and easily implementable. As we show in experiments, these breakthroughs make thought-to-be-infeasible strategies in active learning of causal structures and causal effect identification with regard to a Markov equivalence class practically applicable.  ( 2 min )
    LAWS: Look Around and Warm-Start Natural Gradient Descent for Quantum Neural Networks. (arXiv:2205.02666v1 [quant-ph])
    Variational quantum algorithms (VQAs) have recently received significant attention from the research community due to their promising performance in Noisy Intermediate-Scale Quantum computers (NISQ). However, VQAs run on parameterized quantum circuits (PQC) with randomly initialized parameters are characterized by barren plateaus (BP) where the gradient vanishes exponentially in the number of qubits. In this paper, we first review quantum natural gradient (QNG), which is one of the most popular algorithms used in VQA, from the classical first-order optimization point of view. Then, we proposed a \underline{L}ook \underline{A}round \underline{W}arm-\underline{S}tart QNG (LAWS) algorithm to mitigate the widespread existing BP issues. LAWS is a combinatorial optimization strategy taking advantage of model parameter initialization and fast convergence of QNG. LAWS repeatedly reinitializes parameter search space for the next iteration parameter update. The reinitialized parameter search space is carefully chosen by sampling the gradient close to the current optimal. Moreover, we present a unified framework (WS-SGD) for integrating parameter initialization techniques into the optimizer. We provide the convergence proof of the proposed framework for both convex and non-convex objective functions based on Polyak-Lojasiewicz (PL) condition. Our experiment results show that the proposed algorithm could mitigate the BP and have better generalization ability in quantum classification problems.  ( 2 min )
    Population Predictive Checks. (arXiv:1908.00882v4 [stat.ME] UPDATED)
    Bayesian modeling has become a staple for researchers to articulate assumptions and develop methods tailored for specific data applications. Thanks to recent developments in approximate posterior inference, researchers can easily build, use, and revise complicated Bayesian models for large and rich data. These new abilities, however, bring into focus the problem of model criticism. Researchers need tools to diagnose the fitness of their models, to understand where they fall short, and to guide their revision. In this paper we develop a new method for Bayesian model criticism, the population predictive check (POP-PC). POP-PCs are built on posterior predictive checks (PPCs), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. However, PPCs use the data twice -- both to calculate the posterior predictive and to evaluate it -- which can lead to overconfident assessments of the quality of a model. POP-PCs, in contrast, compare the posterior predictive distribution to a draw from the population distribution, which in practice is a heldout dataset. We prove this strategy, which blends Bayesian modeling with frequentist assessment, is calibrated, unlike the PPC. Moreover, we demonstrate that calibrating PPC p-values post-hoc does not resolve the "double use of the data" problem. Finally, we study POP-PCs on classical regression and a hierarchical model of text data.  ( 2 min )
    Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. (arXiv:1708.02190v3 [cs.AI] UPDATED)
    Intrinsically motivated spontaneous exploration is a key enabler of autonomous developmental learning in human children. It enables the discovery of skill repertoires through autotelic learning, i.e. the self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present an algorithmic approach called Intrinsically Motivated Goal Exploration Processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP architecture relies on several principles: 1) self-generation of goals, generalized as parameterized fitness functions; 2) selection of goals based on intrinsic rewards; 3) exploration with incremental goal-parameterized policy search and exploitation with a batch learning algorithm; 4) systematic reuse of information acquired when targeting a goal for improving towards other goals. We present a particularly efficient form of IMGEP, called AMB, that uses a population-based policy and an object-centered spatio-temporal modularity. We provide several implementations of this architecture and demonstrate their ability to automatically generate a learning curriculum within several experimental setups. One of these experiments includes a real humanoid robot exploring multiple spaces of goals with several hundred continuous dimensions and with distractors. While no particular target goal is provided to these autotelic agents, this curriculum allows the discovery of diverse skills that act as stepping stones for learning more complex skills, e.g. nested tool use.  ( 2 min )
    Communication-Efficient Device Scheduling for Federated Learning Using Stochastic Optimization. (arXiv:2201.07912v2 [cs.LG] UPDATED)
    Federated learning (FL) is a useful tool in distributed machine learning that utilizes users' local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment; however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection quality, and non-i.i.d. data. In this paper, we provide a novel convergence analysis of non-convex loss functions using FL on both i.i.d. and non-i.i.d. datasets with arbitrary device selection probabilities for each round. Then, using the derived convergence bound, we use stochastic optimization to develop a new client selection and power allocation algorithm that minimizes a function of the convergence bound and the average communication time under a transmit power constraint. We find an analytical solution to the minimization problem. One key feature of the algorithm is that knowledge of the channel statistics is not required and only the instantaneous channel state information needs to be known. Using the FEMNIST and CIFAR-10 datasets, we show through simulations that the communication time can be significantly decreased using our algorithm, compared to uniformly random participation.  ( 2 min )
    A Graph Attention Learning Approach to Antenna Tilt Optimization. (arXiv:2112.14843v2 [cs.LG] UPDATED)
    6G will move mobile networks towards increasing levels of complexity. To deal with this complexity, optimization of network parameters is key to ensure high performance and timely adaptivity to dynamic network environments. The optimization of the antenna tilt provides a practical and cost-efficient method to improve coverage and capacity in the network. Previous methods based on Reinforcement Learning (RL) have shown great promise for tilt optimization by learning adaptive policies outperforming traditional tilt optimization methods. However, most existing RL methods are based on single-cell features representation, which fails to fully characterize the agent state, resulting in suboptimal performance. Also, most of such methods lack scalability, due to state-action explosion, and generalization ability. In this paper, we propose a Graph Attention Q-learning (GAQ) algorithm for tilt optimization. GAQ relies on a graph attention mechanism to select relevant neighbors information, improve the agent state representation, and update the tilt control policy based on a history of observations using a Deep Q-Network (DQN). We show that GAQ efficiently captures important network information and outperforms standard DQN with local information by a large margin. In addition, we demonstrate its ability to generalize to network deployments of different sizes and densities.  ( 2 min )
    Training Recurrent Neural Networks by Sequential Least Squares and the Alternating Direction Method of Multipliers. (arXiv:2112.15348v2 [cs.LG] UPDATED)
    This paper proposes a novel algorithm for training recurrent neural network models of nonlinear dynamical systems from an input/output training dataset. Arbitrary convex and twice-differentiable loss functions and regularization terms are handled by sequential least squares and either a line-search (LS) or a trust-region method \`a la Levenberg-Marquardt (LM) for ensuring convergence. In addition, to handle non-smooth regularization terms such as $\ell_1$, $\ell_0$, and group-Lasso regularizers, as well as to impose possibly non-convex constraints such as integer and mixed-integer constraints, we combine sequential least squares with the alternating direction method of multipliers (ADMM). We call the resulting algorithm NAILS (nonconvex ADMM iterations and least squares) in the case line search (LS) is used, or NAILM if a trust-region method (LM) is employed instead. The training method, which is also applicable to feedforward neural networks as a special case, is tested in two nonlinear system identification problems.  ( 2 min )
    The Role of Explainability in Assuring Safety of Machine Learning in Healthcare. (arXiv:2109.00520v2 [cs.LG] UPDATED)
    Established approaches to assuring safety-critical systems and software are difficult to apply to systems employing ML where there is no clear, pre-defined specification against which to assess validity. This problem is exacerbated by the "opaque" nature of ML where the learnt model is not amenable to human scrutiny. Explainable AI (XAI) methods have been proposed to tackle this issue by producing human-interpretable representations of ML models which can help users to gain confidence and build trust in the ML system. However, little work explicitly investigates the role of explainability for safety assurance in the context of ML development. This paper identifies ways in which XAI methods can contribute to safety assurance of ML-based systems. It then uses a concrete ML-based clinical decision support system, concerning weaning of patients from mechanical ventilation, to demonstrate how XAI methods can be employed to produce evidence to support safety assurance. The results are also represented in a safety argument to show where, and in what way, XAI methods can contribute to a safety case. Overall, we conclude that XAI methods have a valuable role in safety assurance of ML-based systems in healthcare but that they are not sufficient in themselves to assure safety.  ( 2 min )
    View-labels Are Indispensable: A Multifacet Complementarity Study of Multi-view Clustering. (arXiv:2205.02507v1 [cs.LG])
    Consistency and complementarity are two key ingredients for boosting multi-view clustering (MVC). Recently with the introduction of popular contrastive learning, the consistency learning of views has been further enhanced in MVC, leading to promising performance. However, by contrast, the complementarity has not received sufficient attention except just in the feature facet, where the Hilbert Schmidt Independence Criterion (HSIC) term or the independent encoder-decoder network is usually adopted to capture view-specific information. This motivates us to reconsider the complementarity learning of views comprehensively from multiple facets including the feature-, view-label- and contrast- facets, while maintaining the view consistency. We empirically find that all the facets contribute to the complementarity learning, especially the view-label facet, which is usually neglected by existing methods. Based on this, we develop a novel \underline{M}ultifacet \underline{C}omplementarity learning framework for \underline{M}ulti-\underline{V}iew \underline{C}lustering (MCMVC), which fuses multifacet complementarity information, especially explicitly embedding the view-label information. To our best knowledge, it is the first time to use view-labels explicitly to guide the complementarity learning of views. Compared with the SOTA baseline, MCMVC achieves remarkable improvements, e.g., by average margins over $5.00\%$ and $7.00\%$ respectively in complete and incomplete MVC settings on Caltech101-20 in terms of three evaluation metrics.  ( 2 min )
    FAITH: Few-Shot Graph Classification with Hierarchical Task Graphs. (arXiv:2205.02435v1 [cs.LG])
    Few-shot graph classification aims at predicting classes for graphs, given limited labeled graphs for each class. To tackle the bottleneck of label scarcity, recent works propose to incorporate few-shot learning frameworks for fast adaptations to graph classes with limited labeled graphs. Specifically, these works propose to accumulate meta-knowledge across diverse meta-training tasks, and then generalize such meta-knowledge to the target task with a disjoint label set. However, existing methods generally ignore task correlations among meta-training tasks while treating them independently. Nevertheless, such task correlations can advance the model generalization to the target task for better classification performance. On the other hand, it remains non-trivial to utilize task correlations due to the complex components in a large number of meta-training tasks. To deal with this, we propose a novel few-shot learning framework FAITH that captures task correlations via constructing a hierarchical task graph at different granularities. Then we further design a loss-based sampling strategy to select tasks with more correlated classes. Moreover, a task-specific classifier is proposed to utilize the learned task correlations for few-shot classification. Extensive experiments on four prevalent few-shot graph classification datasets demonstrate the superiority of FAITH over other state-of-the-art baselines.  ( 2 min )
    Second-Order Sensitivity Analysis for Bilevel Optimization. (arXiv:2205.02329v1 [math.OC])
    In this work we derive a second-order approach to bilevel optimization, a type of mathematical programming in which the solution to a parameterized optimization problem (the "lower" problem) is itself to be optimized (in the "upper" problem) as a function of the parameters. Many existing approaches to bilevel optimization employ first-order sensitivity analysis, based on the implicit function theorem (IFT), for the lower problem to derive a gradient of the lower problem solution with respect to its parameters; this IFT gradient is then used in a first-order optimization method for the upper problem. This paper extends this sensitivity analysis to provide second-order derivative information of the lower problem (which we call the IFT Hessian), enabling the usage of faster-converging second-order optimization methods at the upper level. Our analysis shows that (i) much of the computation already used to produce the IFT gradient can be reused for the IFT Hessian, (ii) errors bounds derived for the IFT gradient readily apply to the IFT Hessian, (iii) computing IFT Hessians can significantly reduce overall computation by extracting more information from each lower level solve. We corroborate our findings and demonstrate the broad range of applications of our method by applying it to problem instances of least squares hyperparameter auto-tuning, multi-class SVM auto-tuning, and inverse optimal control.
    Predicting Basin Stability of Power Grids using Graph Neural Networks. (arXiv:2108.08230v3 [physics.soc-ph] UPDATED)
    The prediction of dynamical stability of power grids becomes more important and challenging with increasing shares of renewable energy sources due to their decentralized structure, reduced inertia and volatility. We investigate the feasibility of applying graph neural networks (GNN) to predict dynamic stability of synchronisation in complex power grids using the single-node basin stability (SNBS) as a measure. To do so, we generate two synthetic datasets for grids with 20 and 100 nodes respectively and estimate SNBS using Monte-Carlo sampling. Those datasets are used to train and evaluate the performance of eight different GNN-models. All models use the full graph without simplifications as input and predict SNBS in a nodal-regression-setup. We show that SNBS can be predicted in general and the performance significantly changes using different GNN-models. Furthermore, we observe interesting transfer capabilities of our approach: GNN-models trained on smaller grids can directly be applied on larger grids without the need of retraining.
    Non-Euclidean Differentially Private Stochastic Convex Optimization: Optimal Rates in Linear Time. (arXiv:2103.01278v2 [cs.LG] UPDATED)
    Differentially private (DP) stochastic convex optimization (SCO) is a fundamental problem, where the goal is to approximately minimize the population risk with respect to a convex loss function, given a dataset of $n$ i.i.d. samples from a distribution, while satisfying differential privacy with respect to the dataset. Most of the existing works in the literature of private convex optimization focus on the Euclidean (i.e., $\ell_2$) setting, where the loss is assumed to be Lipschitz (and possibly smooth) w.r.t. the $\ell_2$ norm over a constraint set with bounded $\ell_2$ diameter. Algorithms based on noisy stochastic gradient descent (SGD) are known to attain the optimal excess risk in this setting. In this work, we conduct a systematic study of DP-SCO for $\ell_p$-setups under a standard smoothness assumption on the loss. For $1< p\leq 2$, under a standard smoothness assumption, we give a new, linear-time DP-SCO algorithm with optimal excess risk. Previously known constructions with optimal excess risk for $1< p <2$ run in super-linear time in $n$. For $p=1$, we give an algorithm with nearly optimal excess risk. Our result for the $\ell_1$-setup also extends to general polyhedral norms and feasible sets. Moreover, we show that the excess risk bounds resulting from our algorithms for $1\leq p \leq 2$ are attained with high probability. For $2 < p \leq \infty$, we show that existing linear-time constructions for the Euclidean setup attain a nearly optimal excess risk in the low-dimensional regime. As a consequence, we show that such constructions attain a nearly optimal excess risk for $p=\infty$. Our work draws upon concepts from the geometry of normed spaces, such as the notions of regularity, uniform convexity, and uniform smoothness.
    Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention. (arXiv:2112.03254v3 [cs.CL] UPDATED)
    Most of today's AI systems focus on using self-attention mechanisms and transformer architectures on large amounts of diverse data to achieve impressive performance gains. In this paper, we propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear. By integrating external information into the prediction process, we hope to reduce the need for ever-larger models and increase the democratization of AI systems. We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems, allowing practitioners to easily customize foundation AI models to many diverse downstream applications. In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities. The proposed system, Knowledgeable External Attention for commonsense Reasoning (KEAR), reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4\% in comparison to the human accuracy of 88.9\%.
    Maximum Entropy RL (Provably) Solves Some Robust RL Problems. (arXiv:2103.06257v2 [cs.LG] UPDATED)
    Many potential applications of reinforcement learning (RL) require guarantees that the agent will perform well in the face of disturbances to the dynamics or reward function. In this paper, we prove theoretically that maximum entropy (MaxEnt) RL maximizes a lower bound on a robust RL objective, and thus can be used to learn policies that are robust to some disturbances in the dynamics and the reward function. While this capability of MaxEnt RL has been observed empirically in prior work, to the best of our knowledge our work provides the first rigorous proof and theoretical characterization of the MaxEnt RL robust set. While a number of prior robust RL algorithms have been designed to handle similar disturbances to the reward function or dynamics, these methods typically require additional moving parts and hyperparameters on top of a base RL algorithm. In contrast, our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications. While this does not imply that MaxEnt RL is the best available robust RL method, MaxEnt RL is a simple robust RL method with appealing formal guarantees.
    Dynamic Bayesian Network Auxiliary ABC-SMC for Hybrid Model Bayesian Inference to Accelerate Biomanufacturing Process Mechanism Learning and Robust Control. (arXiv:2205.02410v1 [stat.ML])
    Driven by the critical needs of biomanufacturing 4.0, we present a probabilistic knowledge graph hybrid model characterizing complex spatial-temporal causal interdependencies of underlying bioprocessing mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given limited process observations, we derive a posterior distribution quantifying model uncertainty, which can facilitate mechanism learning and support robust process control. To avoid evaluation of intractable likelihood, Approximate Bayesian Computation sampling with Sequential Monte Carlo (ABC-SMC) is developed to approximate the posterior distribution. Given high stochastic and model uncertainties, it is computationally expensive to match process output trajectories. Therefore, we propose a linear Gaussian dynamic Bayesian network (LG-DBN) auxiliary likelihood-based ABC-SMC algorithm. Through matching observed and simulated summary statistics, the proposed approach can dramatically reduce the computation cost and accelerate the posterior approximation convergence.
    SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling. (arXiv:2005.06377v3 [cs.CL] UPDATED)
    Canonical automatic summary evaluation metrics, such as ROUGE, focus on lexical similarity which cannot well capture semantics nor linguistic quality and require a reference summary which is costly to obtain. Recently, there have been a growing number of efforts to alleviate either or both of the two drawbacks. In this paper, we present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries. Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries. In cross-domain tests, our strategy outperforms baselines with promising improvements, and show a great advantage in gauging linguistic qualities over all metrics.  ( 2 min )
    Characterizing Intersectional Group Fairness with Worst-Case Comparisons. (arXiv:2101.01673v5 [cs.LG] UPDATED)
    Machine Learning or Artificial Intelligence algorithms have gained considerable scrutiny in recent times owing to their propensity towards imitating and amplifying existing prejudices in society. This has led to a niche but growing body of work that identifies and attempts to fix these biases. A first step towards making these algorithms more fair is designing metrics that measure unfairness. Most existing work in this field deals with either a binary view of fairness (protected vs. unprotected groups) or politically defined categories (race or gender). Such categorization misses the important nuance of intersectionality - biases can often be amplified in subgroups that combine membership from different categories, especially if such a subgroup is particularly underrepresented in historical platforms of opportunity. In this paper, we discuss why fairness metrics need to be looked at under the lens of intersectionality, identify existing work in intersectional fairness, suggest a simple worst case comparison method to expand the definitions of existing group fairness metrics to incorporate intersectionality, and finally conclude with the social, legal and political framework to handle intersectional fairness in the modern context.  ( 2 min )
    Physics-Informed Deep Reversible Regression Model for Temperature Field Reconstruction of Heat-Source Systems. (arXiv:2106.11929v4 [cs.LG] UPDATED)
    Temperature monitoring during the life time of heat source components in engineering systems becomes essential to guarantee the normal work and the working life of these components. However, prior methods, which mainly use the interpolate estimation to reconstruct the temperature field from limited monitoring points, require large amounts of temperature tensors for an accurate estimation. This may decrease the availability and reliability of the system and sharply increase the monitoring cost. To solve this problem, this work develops a novel physics-informed deep reversible regression models for temperature field reconstruction of heat-source systems (TFR-HSS), which can better reconstruct the temperature field with limited monitoring points unsupervisedly. First, we define the TFR-HSS task mathematically, and numerically model the task, and hence transform the task as an image-to-image regression problem. Then this work develops the deep reversible regression model which can better learn the physical information, especially over the boundary. Finally, considering the physical characteristics of heat conduction as well as the boundary conditions, this work proposes the physics-informed reconstruction loss including four training losses and jointly learns the deep surrogate model with these losses unsupervisedly. Experimental studies have conducted over typical two-dimensional heat-source systems to demonstrate the effectiveness of the proposed method.  ( 2 min )
    Dropout Strikes Back: Improved Uncertainty Estimation via Diversity Sampling. (arXiv:2003.03274v3 [cs.LG] UPDATED)
    Uncertainty estimation for machine learning models is of high importance in many scenarios such as constructing the confidence intervals for model predictions and detection of out-of-distribution or adversarially generated points. In this work, we show that modifying the sampling distributions for dropout layers in neural networks improves the quality of uncertainty estimation. Our main idea consists of two main steps: computing data-driven correlations between neurons and generating samples, which include maximally diverse neurons. In a series of experiments on simulated and real-world data, we demonstrate that the diversification via determinantal point processes-based sampling achieves state-of-the-art results in uncertainty estimation for regression and classification tasks. An important feature of our approach is that it does not require any modification to the models or training procedures, allowing straightforward application to any deep learning model with dropout layers.  ( 2 min )
    Development of Interpretable Machine Learning Models to Detect Arrhythmia based on ECG Data. (arXiv:2205.02803v1 [cs.LG])
    The analysis of electrocardiogram (ECG) signals can be time consuming as it is performed manually by cardiologists. Therefore, automation through machine learning (ML) classification is being increasingly proposed which would allow ML models to learn the features of a heartbeat and detect abnormalities. The lack of interpretability hinders the application of Deep Learning in healthcare. Through interpretability of these models, we would understand how a machine learning algorithm makes its decisions and what patterns are being followed for classification. This thesis builds Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) classifiers based on state-of-the-art models and compares their performance and interpretability to shallow classifiers. Here, both global and local interpretability methods are exploited to understand the interaction between dependent and independent variables across the entire dataset and to examine model decisions in each sample, respectively. Partial Dependence Plots, Shapley Additive Explanations, Permutation Feature Importance, and Gradient Weighted Class Activation Maps (Grad-Cam) are the four interpretability techniques implemented on time-series ML models classifying ECG rhythms. In particular, we exploit Grad-Cam, which is a local interpretability technique and examine whether its interpretability varies between correctly and incorrectly classified ECG beats within each class. Furthermore, the classifiers are evaluated using K-Fold cross-validation and Leave Groups Out techniques, and we use non-parametric statistical testing to examine whether differences are significant. It was found that Grad-CAM was the most effective interpretability technique at explaining predictions of proposed CNN and LSTM models. We concluded that all high performing classifiers looked at the QRS complex of the ECG rhythm when making predictions.  ( 2 min )
    Optimising Equal Opportunity Fairness in Model Training. (arXiv:2205.02393v1 [cs.LG])
    Real-world datasets often encode stereotypes and societal biases. Such biases can be implicitly captured by trained models, leading to biased predictions and exacerbating existing societal preconceptions. Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias. However, a disconnect between fairness criteria and training objectives makes it difficult to reason theoretically about the effectiveness of different techniques. In this work, we propose two novel training objectives which directly optimise for the widely-used criterion of {\it equal opportunity}, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.  ( 2 min )
    Meta-learning Feature Representations for Adaptive Gaussian Processes via Implicit Differentiation. (arXiv:2205.02708v1 [cs.LG])
    We propose Adaptive Deep Kernel Fitting (ADKF), a general framework for learning deep kernels by interpolating between meta-learning and conventional learning. Our approach employs a bilevel optimization objective where we meta-learn feature representations that are generally useful across tasks, in the sense that task-specific Gaussian process models estimated on top of such features achieve the lowest possible predictive loss on average across tasks. We solve the resulting nested optimization problem using the implicit function theorem. We show that ADKF contains Deep Kernel Learning and Deep Kernel Transfer as special cases. Although ADKF is a completely general method, we argue that it is especially well-suited for drug discovery problems and demonstrate that it significantly outperforms previous state-of-the-art methods on a variety of real-world few-shot molecular property prediction tasks and out-of-domain molecular optimization tasks.  ( 2 min )
    StyleAlign: Analysis and Applications of Aligned StyleGAN Models. (arXiv:2110.11323v2 [cs.CV] UPDATED)
    In this paper, we perform an in-depth study of the properties and applications of aligned generative models. We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.  ( 2 min )
    Can collaborative learning be private, robust and scalable?. (arXiv:2205.02652v1 [cs.LG])
    We investigate the effectiveness of combining differential privacy, model compression and adversarial training to improve the robustness of models against adversarial samples in train- and inference-time attacks. We explore the applications of these techniques as well as their combinations to determine which method performs best, without a significant utility trade-off. Our investigation provides a practical overview of various methods that allow one to achieve a competitive model performance, a significant reduction in model's size and an improved empirical adversarial robustness without a severe performance degradation.  ( 2 min )
    Rapid Locomotion via Reinforcement Learning. (arXiv:2205.02824v1 [cs.RO])
    Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot's behaviors are available at: https://agility.csail.mit.edu/  ( 2 min )
    Multi-fold Correlation Attention Network for Predicting Traffic Speeds with Heterogeneous Frequency. (arXiv:2104.09083v2 [cs.LG] UPDATED)
    Substantial efforts have been devoted to the investigation of spatiotemporal correlations for improving traffic speed prediction accuracy. However, existing works typically model the correlations based solely on the observed traffic state (e.g. traffic speed) without due consideration that different correlation measurements of the traffic data could exhibit a diverse set of patterns under different traffic situations. In addition, the existing works assume that all road segments can employ the same sampling frequency of traffic states, which is impractical. In this paper, we propose new measurements to model the spatial correlations among traffic data and show that the resulting correlation patterns vary significantly under various traffic situations. We propose a Heterogeneous Spatial Correlation (HSC) model to capture the spatial correlation based on a specific measurement, where the traffic data of varying road segments can be heterogeneous (i.e. obtained with different sampling frequency). We propose a Multi-fold Correlation Attention Network (MCAN), which relies on the HSC model to explore multi-fold spatial correlations and leverage LSTM networks to capture multi-fold temporal correlations to provide discriminating features in order to achieve accurate traffic prediction. The learned multi-fold spatiotemporal correlations together with contextual factors are fused with attention mechanism to make the final predictions. Experiments on real-world datasets demonstrate that the proposed MCAN model outperforms the state-of-the-art baselines.  ( 2 min )
    On the Dual Formulation of Boosting Algorithms. (arXiv:0901.3590v6 [cs.LG] UPDATED)
    We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of AdaBoost, LogitBoost and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance.We also theoretically prove that, approximately, AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column generation based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stage-wise additive boosting algorithms but with much faster convergence rates. Therefore fewer weak classifiers are needed to build the ensemble using our proposed optimization technique.
    Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics. (arXiv:2205.02835v1 [cs.RO])
    Differentiable physics has recently been shown as a powerful tool for solving soft-body manipulation tasks. However, the differentiable physics solver often gets stuck when the initial contact points of the end effectors are sub-optimal or when performing multi-stage tasks that require contact point switching, which often leads to local minima. To address this challenge, we propose a contact point discovery approach (CPDeform) that guides the stand-alone differentiable physics solver to deform various soft-body plasticines. The key idea of our approach is to integrate optimal transport-based contact points discovery into the differentiable physics solver to overcome the local minima from initial contact points or contact switching. On single-stage tasks, our method can automatically find suitable initial contact points based on transport priorities. On complex multi-stage tasks, we can iteratively switch the contact points of end-effectors based on transport priorities. To evaluate the effectiveness of our method, we introduce PlasticineLab-M that extends the existing differentiable physics benchmark PlasticineLab to seven new challenging multi-stage soft-body manipulation tasks. Extensive experimental results suggest that: 1) on multi-stage tasks that are infeasible for the vanilla differentiable physics solver, our approach discovers contact points that efficiently guide the solver to completion; 2) on tasks where the vanilla solver performs sub-optimally or near-optimally, our contact point discovery method performs better than or on par with the manipulation performance obtained with handcrafted contact points.
    Time Shifts to Reduce the Size of Reservoir Computers. (arXiv:2205.02267v1 [cs.NE])
    A reservoir computer is a type of dynamical system arranged to do computation. Typically, a reservoir computer is constructed by connecting a large number of nonlinear nodes in a network that includes recurrent connections. In order to achieve accurate results, the reservoir usually contains hundreds to thousands of nodes. This high dimensionality makes it difficult to analyze the reservoir computer using tools from dynamical systems theory. Additionally, the need to create and connect large numbers of nonlinear nodes makes it difficult to design and build analog reservoir computers that can be faster and consume less power than digital reservoir computers. We demonstrate here that a reservoir computer may be divided into two parts; a small set of nonlinear nodes (the reservoir), and a separate set of time-shifted reservoir output signals. The time-shifted output signals serve to increase the rank and memory of the reservoir computer, and the set of nonlinear nodes may create an embedding of the input dynamical system. We use this time-shifting technique to obtain excellent performance from an opto-electronic delay-based reservoir computer with only a small number of virtual nodes. Because only a few nonlinear nodes are required, construction of a reservoir computer becomes much easier, and delay-based reservoir computers can operate at much higher speeds.
    General sum stochastic games with networked information flows. (arXiv:2205.02760v1 [cs.LG])
    Inspired by applications such as supply chain management, epidemics, and social networks, we formulate a stochastic game model that addresses three key features common across these domains: 1) network-structured player interactions, 2) pair-wise mixed cooperation and competition among players, and 3) limited global information toward individual decision-making. In combination, these features pose significant challenges for black box approaches taken by deep learning-based multi-agent reinforcement learning (MARL) algorithms and deserve more detailed analysis. We formulate a networked stochastic game with pair-wise general sum objectives and asymmetrical information structure, and empirically explore the effects of information availability on the outcomes of different MARL paradigms such as individual learning and centralized learning decentralized execution.
    Identifying Cause-and-Effect Relationships of Manufacturing Errors using Sequence-to-Sequence Learning. (arXiv:2205.02827v1 [cs.LG])
    In car-body production the pre-formed sheet metal parts of the body are assembled on fully-automated production lines. The body passes through multiple stations in succession, and is processed according to the order requirements. The timely completion of orders depends on the individual station-based operations concluding within their scheduled cycle times. If an error occurs in one station, it can have a knock-on effect, resulting in delays on the downstream stations. To the best of our knowledge, there exist no methods for automatically distinguishing between source and knock-on errors in this setting, as well as establishing a causal relation between them. Utilizing real-time information about conditions collected by a production data acquisition system, we propose a novel vehicle manufacturing analysis system, which uses deep learning to establish a link between source and knock-on errors. We benchmark three sequence-to-sequence models, and introduce a novel composite time-weighted action metric for evaluating models in this context. We evaluate our framework on a real-world car production dataset recorded by Volkswagen Commercial Vehicles. Surprisingly we find that 71.68% of sequences contain either a source or knock-on error. With respect to seq2seq model training, we find that the Transformer demonstrates a better performance compared to LSTM and GRU in this domain, in particular when the prediction range with respect to the durations of future actions is increased.  ( 2 min )
    Maximum n-times Coverage for Vaccine Design. (arXiv:2101.10902v5 [q-bio.QM] UPDATED)
    We introduce the maximum $n$-times coverage problem that selects $k$ overlays to maximize the summed coverage of weighted elements, where each element must be covered at least $n$ times. We also define the min-cost $n$-times coverage problem where the objective is to select the minimum set of overlays such that the sum of the weights of elements that are covered at least $n$ times is at least $\tau$. Maximum $n$-times coverage is a generalization of the multi-set multi-cover problem, is NP-complete, and is not submodular. We introduce two new practical solutions for $n$-times coverage based on integer linear programming and sequential greedy optimization. We show that maximum $n$-times coverage is a natural way to frame peptide vaccine design, and find that it produces a pan-strain COVID-19 vaccine design that is superior to 29 other published designs in predicted population coverage and the expected number of peptides displayed by each individual's HLA molecules.
    Sound Event Classification in an Industrial Environment: Pipe Leakage Detection Use Case. (arXiv:2205.02706v1 [cs.LG])
    In this work, a multi-stage Machine Learning (ML) pipeline is proposed for pipe leakage detection in an industrial environment. As opposed to other industrial and urban environments, the environment under study includes many interfering background noises, complicating the identification of leaks. Furthermore, the harsh environmental conditions limit the amount of data collected and impose the use of low-complexity algorithms. To address the environment's constraints, the developed ML pipeline applies multiple steps, each addressing the environment's challenges. The proposed ML pipeline first reduces the data dimensionality by feature selection techniques and then incorporates time correlations by extracting time-based features. The resultant features are fed to a Support Vector Machine (SVM) of low-complexity that generalizes well to a small amount of data. An extensive experimental procedure was carried out on two datasets, one with background industrial noise and one without, to evaluate the validity of the proposed pipeline. The SVM hyper-parameters and parameters specific to the pipeline steps were tuned as part of the experimental procedure. The best models obtained from the dataset with industrial noise and leaks were applied to datasets without noise and with and without leaks to test their generalizability. The results show that the model produces excellent results with 99\% accuracy and an F1-score of 0.93 and 0.9 for the respective datasets.
    Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction. (arXiv:2205.02834v1 [cs.CV])
    This paper studies the problem of fixing malfunctional 3D objects. While previous works focus on building passive perception models to learn the functionality from static 3D objects, we argue that functionality is reckoned with respect to the physical interactions between the object and the user. Given a malfunctional object, humans can perform mental simulations to reason about its functionality and figure out how to fix it. Inspired by this, we propose FixIt, a dataset that contains about 5k poorly-designed 3D physical objects paired with choices to fix them. To mimic humans' mental simulation process, we present FixNet, a novel framework that seamlessly incorporates perception and physical dynamics. Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix. Experimental results show that our framework outperforms baseline models by a large margin, and can generalize well to objects with similar interaction types.  ( 2 min )
    pyRDF2Vec: A Python Implementation and Extension of RDF2Vec. (arXiv:2205.02283v1 [cs.LG])
    This paper introduces pyRDF2Vec, a Python software package that reimplements the well-known RDF2Vec algorithm along with several of its extensions. By making the algorithm available in the most popular data science language, and by bundling all extensions into a single place, the use of RDF2Vec is simplified for data scientists. The package is released under a MIT license and structured in such a way to foster further research into sampling, walking, and embedding strategies, which are vital components of the RDF2Vec algorithm. Several optimisations have been implemented in \texttt{pyRDF2Vec} that allow for more efficient walk extraction than the original algorithm. Furthermore, best practices in terms of code styling, testing, and documentation were applied such that the package is future-proof as well as to facilitate external contributions.  ( 2 min )
    Multi-Agent Deep Reinforcement Learning in Vehicular OCC. (arXiv:2205.02672v1 [cs.LG])
    Optical camera communications (OCC) has emerged as a key enabling technology for the seamless operation of future autonomous vehicles. In this paper, we introduce a spectral efficiency optimization approach in vehicular OCC. Specifically, we aim at optimally adapting the modulation order and the relative speed while respecting bit error rate and latency constraints. As the optimization problem is NP-hard problem, we model the optimization problem as a Markov decision process (MDP) to enable the use of solutions that can be applied online. We then relaxed the constrained problem by employing Lagrange relaxation approach before solving it by multi-agent deep reinforcement learning (DRL). We verify the performance of our proposed scheme through extensive simulations and compare it with various variants of our approach and a random method. The evaluation shows that our system achieves significantly higher sum spectral efficiency compared to schemes under comparison.  ( 2 min )
    Finding Bipartite Components in Hypergraphs. (arXiv:2205.02771v1 [cs.DS])
    Hypergraphs are important objects to model ternary or higher-order relations of objects, and have a number of applications in analysing many complex datasets occurring in practice. In this work we study a new heat diffusion process in hypergraphs, and employ this process to design a polynomial-time algorithm that approximately finds bipartite components in a hypergraph. We theoretically prove the performance of our proposed algorithm, and compare it against the previous state-of-the-art through extensive experimental analysis on both synthetic and real-world datasets. We find that our new algorithm consistently and significantly outperforms the previous state-of-the-art across a wide range of hypergraphs.  ( 2 min )
    KnitCity: a machine learning-based, game-theoretical framework for prediction assessment and seismic risk policy design. (arXiv:2205.02679v1 [cs.LG])
    Knitted fabric exhibits avalanche-like events when deformed: by analogy with eathquakes, we are interested in predicting these "knitquakes". However, as in most analogous seismic models, the peculiar statistics of the corresponding time-series severely jeopardize this endeavour, due to the time intermittence and scale-invariance of these events. But more importantly, such predictions are hard to {\it assess}: depending on the choice of what to predict, the results can be very different and not easily compared. Furthermore, forecasting models may be trained with various generic metrics which ignore some important specificities of the problem at hand, in our case seismic risk. Finally, these models often do not provide a clear strategy regarding the best way to use these predictions in practice. Here we introduce a framework that allows to design, evaluate and compare not only predictors but also decision-making policies: a model seismically active {\it city} subjected to the crackling dynamics observed in the mechanical response of knitted fabric. We thus proceed to study the population of KnitCity, introducing a policy through which the mayor of the town can decide to either keep people in, which in case of large events cause human loss, or evacuate the city, which costs a daily fee. The policy only relies on past seismic observations. We construct efficient policies using a reinforcement learning environment and various time-series predictors based on artificial neural networks. By inducing a physically motivated metric on the predictors, this mechanism allows quantitative assessment and comparison of their relevance in the decision-making process.  ( 2 min )
    Morphological Wobbling Can Help Robots Learn. (arXiv:2205.02811v1 [cs.LG])
    We propose to make the physical characteristics of a robot oscillate while it learns to improve its behavioral performance. We consider quantities such as mass, actuator strength, and size that are usually fixed in a robot, and show that when those quantities oscillate at the beginning of the learning process on a simulated 2D soft robot, the performance on a locomotion task can be significantly improved. We investigate the dynamics of the phenomenon and conclude that in our case, surprisingly, a high-frequency oscillation with a large amplitude for a large portion of the learning duration leads to the highest performance benefits. Furthermore, we show that morphological wobbling significantly increases exploration of the search space.  ( 2 min )
    Quantum Extremal Learning. (arXiv:2205.02807v1 [quant-ph])
    We propose a quantum algorithm for `extremal learning', which is the process of finding the input to a hidden function that extremizes the function output, without having direct access to the hidden function, given only partial input-output (training) data. The algorithm, called quantum extremal learning (QEL), consists of a parametric quantum circuit that is variationally trained to model data input-output relationships and where a trainable quantum feature map, that encodes the input data, is analytically differentiated in order to find the coordinate that extremizes the model. This enables the combination of established quantum machine learning modelling with established quantum optimization, on a single circuit/quantum computer. We have tested our algorithm on a range of classical datasets based on either discrete or continuous input variables, both of which are compatible with the algorithm. In case of discrete variables, we test our algorithm on synthetic problems formulated based on Max-Cut problem generators and also considering higher order correlations in the input-output relationships. In case of the continuous variables, we test our algorithm on synthetic datasets in 1D and simple ordinary differential functions. We find that the algorithm is able to successfully find the extremal value of such problems, even when the training dataset is sparse or a small fraction of the input configuration space. We additionally show how the algorithm can be used for much more general cases of higher dimensionality, complex differential equations, and with full flexibility in the choice of both modeling and optimization ansatz. We envision that due to its general framework and simple construction, the QEL algorithm will be able to solve a wide variety of applications in different fields, opening up areas of further research.  ( 2 min )
    Unsupervised Mismatch Localization in Cross-Modal Sequential Data. (arXiv:2205.02670v1 [cs.LG])
    Content mismatch usually occurs when data from one modality is translated to another, e.g. language learners producing mispronunciations (errors in speech) when reading a sentence (target text) aloud. However, most existing alignment algorithms assume the content involved in the two modalities is perfectly matched and thus leading to difficulty in locating such mismatch between speech and text. In this work, we develop an unsupervised learning algorithm that can infer the relationship between content-mismatched cross-modal sequential data, especially for speech-text sequences. More specifically, we propose a hierarchical Bayesian deep learning model, named mismatch localization variational autoencoder (ML-VAE), that decomposes the generative process of the speech into hierarchically structured latent variables, indicating the relationship between the two modalities. Training such a model is very challenging due to the discrete latent variables with complex dependencies involved. We propose a novel and effective training procedure which estimates the hard assignments of the discrete latent variables over a specifically designed lattice and updates the parameters of neural networks alternatively. Our experimental results show that ML-VAE successfully locates the mismatch between text and speech, without the need for human annotations for model training.  ( 2 min )
    Towards Fast Simulation of Environmental Fluid Mechanics with Multi-Scale Graph Neural Networks. (arXiv:2205.02637v1 [physics.flu-dyn])
    Numerical simulators are essential tools in the study of natural fluid-systems, but their performance often limits application in practice. Recent machine-learning approaches have demonstrated their ability to accelerate spatio-temporal predictions, although, with only moderate accuracy in comparison. Here we introduce MultiScaleGNN, a novel multi-scale graph neural network model for learning to infer unsteady continuum mechanics in problems encompassing a range of length scales and complex boundary geometries. We demonstrate this method on advection problems and incompressible fluid dynamics, both fundamental phenomena in oceanic and atmospheric processes. Our results show good extrapolation to new domain geometries and parameters for long-term temporal simulations. Simulations obtained with MultiScaleGNN are between two and four orders of magnitude faster than those on which it was trained.  ( 2 min )
    When Fair Ranking Meets Uncertain Inference. (arXiv:2105.02091v2 [cs.IR] UPDATED)
    Existing fair ranking systems, especially those designed to be demographically fair, assume that accurate demographic information about individuals is available to the ranking algorithm. In practice, however, this assumption may not hold -- in real-world contexts like ranking job applicants or credit seekers, social and legal barriers may prevent algorithm operators from collecting peoples' demographic information. In these cases, algorithm operators may attempt to infer peoples' demographics and then supply these inferences as inputs to the ranking algorithm. In this study, we investigate how uncertainty and errors in demographic inference impact the fairness offered by fair ranking algorithms. Using simulations and three case studies with real datasets, we show how demographic inferences drawn from real systems can lead to unfair rankings. Our results suggest that developers should not use inferred demographic data as input to fair ranking algorithms, unless the inferences are extremely accurate.  ( 2 min )
    What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning. (arXiv:2205.02671v1 [cs.CV])
    Understanding spatial relations is essential for intelligent agents to act and communicate in the physical world. Relative directions are spatial relations that describe the relative positions of target objects with regard to the intrinsic orientation of reference objects. Grounding relative directions is more difficult than grounding absolute directions because it not only requires a model to detect objects in the image and to identify spatial relation based on this information, but it also needs to recognize the orientation of objects and integrate this information into the reasoning process. We investigate the challenging problem of grounding relative directions with end-to-end neural networks. To this end, we provide GRiD-3D, a novel dataset that features relative directions and complements existing visual question answering (VQA) datasets, such as CLEVR, that involve only absolute directions. We also provide baselines for the dataset with two established end-to-end VQA models. Experimental evaluations show that answering questions on relative directions is feasible when questions in the dataset simulate the necessary subtasks for grounding relative directions. We discover that those subtasks are learned in an order that reflects the steps of an intuitive pipeline for processing relative directions.  ( 2 min )
    Rethinking Classifier And Adversarial Attack. (arXiv:2205.02743v1 [cs.LG])
    Various defense models have been proposed to resist adversarial attack algorithms, but existing adversarial robustness evaluation methods always overestimate the adversarial robustness of these models (i.e. not approaching the lower bound of robustness). To solve this problem, this paper first uses the Decouple Space method to divide the classifier into two parts: non-linear and linear. On this basis, this paper defines the representation vector of original example (and its space, i.e., the representation space) and uses Absolute Classification Boundaries Initialization (ACBI) iterative optimization to obtain a better attack starting point (i.e. attacking from this point can approach the lower bound of robustness faster). Particularly, this paper apply ACBI to nearly 50 widely-used defense models (including 8 architectures). Experimental results show that ACBI achieves lower robust accuracy in all cases.  ( 2 min )
    A collection of invited non-archival papers for the Conference on Health, Inference, and Learning (CHIL) 2022. (arXiv:2205.02752v1 [cs.LG])
    A collection of invited non-archival papers for the Conference on Health, Inference, and Learning (CHIL) 2022. This index is incomplete as some authors of invited non-archival presentations opted not to include their papers in this index.
    Chemoreception and chemotaxis of a three-sphere swimmer. (arXiv:2205.02678v1 [cs.LG])
    The coupled problem of hydrodynamics and solute transport for the Najafi-Golestanian three-sphere swimmer is studied, with the Reynolds number set to zero and P\'eclet numbers (Pe) ranging from 0.06 to 60. The adopted method is the numerical simulation of the problem with a finite element code based upon the FEniCS library. For the swimmer executing the optimal locomotion gait, we report the Sherwood number as a function of Pe in homogeneous fluids and confirm that little gain in solute flux is achieved by swimming unless Pe is significantly larger than 10. We also consider the swimmer as an learning agent moving inside a fluid that has a concentration gradient. The outcomes of Q-learning processes show that learning locomotion (with the displacement as reward) is significantly easier than learning chemotaxis (with the increase of solute flux as reward). The chemotaxis problem, even at low Pe, has a varying environment that renders learning more difficult. Further, the learning difficulty increases severely with the P\'eclet number. The results demonstrate the challenges that natural and artificial swimmers need to overcome to migrate efficiently when exposed to chemical inhomogeneities.  ( 2 min )
    Holistic Approach to Measure Sample-level Adversarial Vulnerability and its Utility in Building Trustworthy Systems. (arXiv:2205.02604v1 [cs.CV])
    Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction. Recently, a few works showed inherent bias associated with such attack (robustness bias), where certain subgroups in a dataset (e.g. based on class, gender, etc.) are less robust than others. This bias not only persists even after adversarial training, but often results in severe performance discrepancies across these subgroups. Existing works characterize the subgroup's robustness bias by only checking individual sample's proximity to the decision boundary. In this work, we argue that this measure alone is not sufficient and validate our argument via extensive experimental analysis. It has been observed that adversarial attacks often corrupt the high-frequency components of the input image. We, therefore, propose a holistic approach for quantifying adversarial vulnerability of a sample by combining these different perspectives, i.e., degree of model's reliance on high-frequency features and the (conventional) sample-distance to the decision boundary. We demonstrate that by reliably estimating adversarial vulnerability at the sample level using the proposed holistic metric, it is possible to develop a trustworthy system where humans can be alerted about the incoming samples that are highly likely to be misclassified at test time. This is achieved with better precision when our holistic metric is used over individual measures. To further corroborate the utility of the proposed holistic approach, we perform knowledge distillation in a limited-sample setting. We observe that the student network trained with the subset of samples selected using our combined metric performs better than both the competing baselines, viz., where samples are selected randomly or based on their distances to the decision boundary.  ( 2 min )
    PyDaddy: A Python package for discovering stochastic dynamical equations from timeseries data. (arXiv:2205.02645v1 [q-bio.QM])
    Most real-world ecological dynamics, ranging from ecosystem dynamics to collective animal movement, are inherently stochastic in nature. Stochastic differential equations (SDEs) are a popular modelling framework to model dynamics with intrinsic randomness. Here, we focus on the inverse question: If one has empirically measured time-series data from some system of interest, is it possible to discover the SDE model that best describes the data. Here, we present PyDaddy (PYthon library for DAta Driven DYnamics), a toolbox to construct and analyze interpretable SDE models based on time-series data. We combine traditional approaches for data-driven SDE reconstruction with an equation learning approach, to derive symbolic equations governing the stochastic dynamics. The toolkit is presented as an open-source Python library, and consists of tools to construct and analyze SDEs. Functionality is included for visual examination of the stochastic structure of the data, guided extraction of the functional form of the SDE, and diagnosis and debugging of the underlying assumptions and the extracted model. Using simulated time-series datasets, exhibiting a wide range of dynamics, we show that PyDaddy is able to correctly identify underlying SDE models. We demonstrate the applicability of the toolkit to real-world data using a previously published movement data of a fish school. Starting from the time-series of the observed polarization of the school, pyDaddy readily discovers the SDE model governing the dynamics of group polarization. The model recovered by PyDaddy is consistent with the previous study. In summary, stochastic and noise-induced effects are central to the dynamics of many biological systems. In this context, we present an easy-to-use package to reconstruct SDEs from timeseries data.  ( 2 min )
    Compressive Ptychography using Deep Image and Generative Priors. (arXiv:2205.02397v1 [cs.CV])
    Ptychography is a well-established coherent diffraction imaging technique that enables non-invasive imaging of samples at a nanometer scale. It has been extensively used in various areas such as the defense industry or materials science. One major limitation of ptychography is the long data acquisition time due to mechanical scanning of the sample; therefore, approaches to reduce the scan points are highly desired. However, reconstructions with less number of scan points lead to imaging artifacts and significant distortions, hindering a quantitative evaluation of the results. To address this bottleneck, we propose a generative model combining deep image priors with deep generative priors. The self-training approach optimizes the deep generative neural network to create a solution for a given dataset. We complement our approach with a prior acquired from a previously trained discriminator network to avoid a possible divergence from the desired output caused by the noise in the measurements. We also suggest using the total variation as a complementary before combat artifacts due to measurement noise. We analyze our approach with numerical experiments through different probe overlap percentages and varying noise levels. We also demonstrate improved reconstruction accuracy compared to the state-of-the-art method and discuss the advantages and disadvantages of our approach.  ( 2 min )
    PI-NLF: A Proportional-Integral Approach for Non-negative Latent Factor Analysis. (arXiv:2205.02591v1 [cs.LG])
    A high-dimensional and incomplete (HDI) matrix frequently appears in various big-data-related applications, which demonstrates the inherently non-negative interactions among numerous nodes. A non-negative latent factor (NLF) model performs efficient representation learning to an HDI matrix, whose learning process mostly relies on a single latent factor-dependent, non-negative and multiplicative update (SLF-NMU) algorithm. However, an SLF-NMU algorithm updates a latent factor based on the current update increment only without appropriate considerations of past learning information, resulting in slow convergence. Inspired by the prominent success of a proportional-integral (PI) controller in various applications, this paper proposes a Proportional-Integral-incorporated Non-negative Latent Factor (PI-NLF) model with two-fold ideas: a) establishing an Increment Refinement (IR) mechanism via considering the past update increments following the principle of a PI controller; and b) designing an IR-based SLF-NMU (ISN) algorithm to accelerate the convergence rate of a resultant model. Empirical studies on four HDI datasets demonstrate that a PI-NLF model outperforms the state-of-the-art models in both computational efficiency and estimation accuracy for missing data of an HDI matrix. Hence, this study unveils the feasibility of boosting the performance of a non-negative learning algorithm through an error feedback controller.  ( 2 min )
    Model-Based Deep Learning: On the Intersection of Deep Learning and Optimization. (arXiv:2205.02640v1 [eess.SP])
    Decision making algorithms are used in a multitude of different applications. Conventional approaches for designing decision algorithms employ principled and simplified modelling, based on which one can determine decisions via tractable optimization. More recently, deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models, are becoming increasingly popular. Model-based optimization and data-centric deep learning are often considered to be distinct disciplines. Here, we characterize them as edges of a continuous spectrum varying in specificity and parameterization, and provide a tutorial-style presentation to the methodologies lying in the middle ground of this spectrum, referred to as model-based deep learning. We accompany our presentation with running examples in super-resolution and stochastic control, and show how they are expressed using the provided characterization and specialized in each of the detailed methodologies. The gains of combining model-based optimization and deep learning are demonstrated using experimental results in various applications, ranging from biomedical imaging to digital communications.  ( 2 min )
    Learning to Solve Vehicle Routing Problems: A Survey. (arXiv:2205.02453v1 [cs.LG])
    This paper provides a systematic overview of machine learning methods applied to solve NP-hard Vehicle Routing Problems (VRPs). Recently, there has been a great interest from both machine learning and operations research communities to solve VRPs either by pure learning methods or by combining them with the traditional hand-crafted heuristics. We present the taxonomy of the studies for learning paradigms, solution structures, underlying models, and algorithms. We present in detail the results of the state-of-the-art methods demonstrating their competitiveness with the traditional methods. The paper outlines the future research directions to incorporate learning-based solutions to overcome the challenges of modern transportation systems.  ( 2 min )
    COGMEN: COntextualized GNN based Multimodal Emotion recognitioN. (arXiv:2205.02455v1 [cs.CL])
    Emotions are an inherent part of human interactions, and consequently, it is imperative to develop AI systems that understand and recognize human emotions. During a conversation involving various people, a person's emotions are influenced by the other speaker's utterances and their own emotional state over the utterances. In this paper, we propose COntextualized Graph Neural Network based Multimodal Emotion recognitioN (COGMEN) system that leverages local information (i.e., inter/intra dependency between speakers) and global information (context). The proposed model uses Graph Neural Network (GNN) based architecture to model the complex dependencies (local and global information) in a conversation. Our model gives state-of-the-art (SOTA) results on IEMOCAP and MOSEI datasets, and detailed ablation experiments show the importance of modeling information at both levels.  ( 2 min )
    Contrastive Multi-view Hyperbolic Hierarchical Clustering. (arXiv:2205.02618v1 [cs.LG])
    Hierarchical clustering recursively partitions data at an increasingly finer granularity. In real-world applications, multi-view data have become increasingly important. This raises a less investigated problem, i.e., multi-view hierarchical clustering, to better understand the hierarchical structure of multi-view data. To this end, we propose a novel neural network-based model, namely Contrastive Multi-view Hyperbolic Hierarchical Clustering (CMHHC). It consists of three components, i.e., multi-view alignment learning, aligned feature similarity learning, and continuous hyperbolic hierarchical clustering. First, we align sample-level representations across multiple views in a contrastive way to capture the view-invariance information. Next, we utilize both the manifold and Euclidean similarities to improve the metric property. Then, we embed the representations into a hyperbolic space and optimize the hyperbolic embeddings via a continuous relaxation of hierarchical clustering loss. Finally, a binary clustering tree is decoded from optimized hyperbolic embeddings. Experimental results on five real-world datasets demonstrate the effectiveness of the proposed method and its components.  ( 2 min )
    GANimator: Neural Motion Synthesis from a Single Sequence. (arXiv:2205.02625v1 [cs.GR])
    We present GANimator, a generative model that learns to synthesize novel motions from a single, short motion sequence. GANimator generates motions that resemble the core elements of the original motion, while simultaneously synthesizing novel and diverse movements. Existing data-driven techniques for motion synthesis require a large motion dataset which contains the desired and specific skeletal structure. By contrast, GANimator only requires training on a single motion sequence, enabling novel motion synthesis for a variety of skeletal structures e.g., bipeds, quadropeds, hexapeds, and more. Our framework contains a series of generative and adversarial neural networks, each responsible for generating motions in a specific frame rate. The framework progressively learns to synthesize motion from random noise, enabling hierarchical control over the generated motion content across varying levels of detail. We show a number of applications, including crowd simulation, key-frame editing, style transfer, and interactive control, which all learn from a single input sequence. Code and data for this paper are at https://peizhuoli.github.io/ganimator.  ( 2 min )
    On Disentangled and Locally Fair Representations. (arXiv:2205.02673v1 [cs.LG])
    We study the problem of performing classification in a manner that is fair for sensitive groups, such as race and gender. This problem is tackled through the lens of disentangled and locally fair representations. We learn a locally fair representation, such that, under the learned representation, the neighborhood of each sample is balanced in terms of the sensitive attribute. For instance, when a decision is made to hire an individual, we ensure that the $K$ most similar hired individuals are racially balanced. Crucially, we ensure that similar individuals are found based on attributes not correlated to their race. To this end, we disentangle the embedding space into two representations. The first of which is correlated with the sensitive attribute while the second is not. We apply our local fairness objective only to the second, uncorrelated, representation. Through a set of experiments, we demonstrate the necessity of both disentangled and local fairness for obtaining fair and accurate representations. We evaluate our method on real-world settings such as predicting income and re-incarceration rate and demonstrate the advantage of our method.  ( 2 min )
    Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. (arXiv:2205.02596v1 [cs.CL])
    We present a comprehensive work on automated veracity assessment from dataset creation to developing novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction includes work on retrieval techniques and similarity measurements to ensure a unique set of claims. We then propose novel techniques for automated veracity assessment based on Natural Language Inference including graph convolutional networks and attention based approaches. We have carried out experiments on evidence retrieval and veracity assessment on the dataset using the proposed techniques and found them competitive with SOTA methods, and provided a detailed discussion.  ( 2 min )
    FastRE: Towards Fast Relation Extraction with Convolutional Encoder and Improved Cascade Binary Tagging Framework. (arXiv:2205.02490v1 [cs.CL])
    Recent work for extracting relations from texts has achieved excellent performance. However, most existing methods pay less attention to the efficiency, making it still challenging to quickly extract relations from massive or streaming text data in realistic scenarios. The main efficiency bottleneck is that these methods use a Transformer-based pre-trained language model for encoding, which heavily affects the training speed and inference speed. To address this issue, we propose a fast relation extraction model (FastRE) based on convolutional encoder and improved cascade binary tagging framework. Compared to previous work, FastRE employs several innovations to improve efficiency while also keeping promising performance. Concretely, FastRE adopts a novel convolutional encoder architecture combined with dilated convolution, gated unit and residual connection, which significantly reduces the computation cost of training and inference, while maintaining the satisfactory performance. Moreover, to improve the cascade binary tagging framework, FastRE first introduces a type-relation mapping mechanism to accelerate tagging efficiency and alleviate relation redundancy, and then utilizes a position-dependent adaptive thresholding strategy to obtain higher tagging accuracy and better model generalization. Experimental results demonstrate that FastRE is well balanced between efficiency and performance, and achieves 3-10x training speed, 7-15x inference speed faster, and 1/100 parameters compared to the state-of-the-art models, while the performance is still competitive.  ( 2 min )
    Optimal Algorithms for Mean Estimation under Local Differential Privacy. (arXiv:2205.02466v1 [cs.LG])
    We study the problem of mean estimation of $\ell_2$-bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of designing the protocol with the smallest variance. We show that PrivUnit (Bhowmick et al. 2018) with optimized parameters achieves the optimal variance among a large family of locally private randomizers. To prove this result, we establish some properties of local randomizers, and use symmetrization arguments that allow us to write the optimal randomizer as the optimizer of a certain linear program. These structural results, which should extend to other problems, then allow us to show that the optimal randomizer belongs to the PrivUnit family. We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees. This allows us to establish several useful properties on the exact constants of the optimal error as well as to numerically estimate these constants.  ( 2 min )
    dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training. (arXiv:2205.02473v1 [cs.DC])
    Distributed training using multiple devices (i.e., GPU servers) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Given the complexity of distributed systems, it is challenging to identify the root cause(s) of inefficiency and exercise effective performance optimizations when unexpected low training speed occurs. To date, there exists no software tool which diagnoses performance issues and helps expedite distributed DNN training, while the training can be run using different machine learning frameworks. This paper proposes dPRO, a toolkit that includes: (1) an efficient profiler that collects runtime traces of distributed DNN training across multiple frameworks, especially fine-grained communication traces, and constructs global data flow graphs including detailed communication operations for accurate replay; (2) an optimizer that effectively identifies performance bottlenecks and explores optimization strategies (from computation, communication and memory aspects) for training acceleration. We implement dPRO on multiple deep learning frameworks (PyTorch, TensorFlow, MXNet) and representative communication schemes (AllReduce and Parameter Server architecture). Extensive experiments show that dPRO predicts performance of distributed training in various settings with<5% errors in most cases and finds optimization strategies with up to87.1%speed-up over the baselines.  ( 2 min )
    A Temporal-Pattern Backdoor Attack to Deep Reinforcement Learning. (arXiv:2205.02589v1 [cs.LG])
    Deep reinforcement learning (DRL) has made significant achievements in many real-world applications. But these real-world applications typically can only provide partial observations for making decisions due to occlusions and noisy sensors. However, partial state observability can be used to hide malicious behaviors for backdoors. In this paper, we explore the sequential nature of DRL and propose a novel temporal-pattern backdoor attack to DRL, whose trigger is a set of temporal constraints on a sequence of observations rather than a single observation, and effect can be kept in a controllable duration rather than in the instant. We validate our proposed backdoor attack to a typical job scheduling task in cloud computing. Numerous experimental results show that our backdoor can achieve excellent effectiveness, stealthiness, and sustainability. Our backdoor's average clean data accuracy and attack success rate can reach 97.8% and 97.5%, respectively.  ( 2 min )
    Assistive Recipe Editing through Critiquing. (arXiv:2205.02454v1 [cs.CL])
    There has recently been growing interest in the automatic generation of cooking recipes that satisfy some form of dietary restrictions, thanks in part to the availability of online recipe data. Prior studies have used pre-trained language models, or relied on small paired recipe data (e.g., a recipe paired with a similar one that satisfies a dietary constraint). However, pre-trained language models generate inconsistent or incoherent recipes, and paired datasets are not available at scale. We address these deficiencies with RecipeCrit, a hierarchical denoising auto-encoder that edits recipes given ingredient-level critiques. The model is trained for recipe completion to learn semantic relationships within recipes. Our work's main innovation is our unsupervised critiquing module that allows users to edit recipes by interacting with the predicted ingredients; the system iteratively rewrites recipes to satisfy users' feedback. Experiments on the Recipe1M recipe dataset show that our model can more effectively edit recipes compared to strong language-modeling baselines, creating recipes that satisfy user constraints and are more correct, serendipitous, coherent, and relevant as measured by human judges.  ( 2 min )
    Spot-adaptive Knowledge Distillation. (arXiv:2205.02399v1 [cs.CV])
    Knowledge distillation (KD) has become a well established paradigm for compressing deep neural networks. The typical way of conducting knowledge distillation is to train the student network under the supervision of the teacher network to harness the knowledge at one or multiple spots (i.e., layers) in the teacher network. The distillation spots, once specified, will not change for all the training samples, throughout the whole distillation process. In this work, we argue that distillation spots should be adaptive to training samples and distillation epochs. We thus propose a new distillation strategy, termed spot-adaptive KD (SAKD), to adaptively determine the distillation spots in the teacher network per sample, at every training iteration during the whole distillation period. As SAKD actually focuses on "where to distill" instead of "what to distill" that is widely investigated by most existing works, it can be seamlessly integrated into existing distillation methods to further improve their performance. Extensive experiments with 10 state-of-the-art distillers are conducted to demonstrate the effectiveness of SAKD for improving their distillation performance, under both homogeneous and heterogeneous distillation settings. Code is available at https://github.com/zju-vipa/spot-adaptive-pytorch  ( 2 min )
    Soft and Hard Constrained Parametric Generative Schemes for Encoding and Synthesizing Airfoils. (arXiv:2205.02458v1 [physics.flu-dyn])
    Traditional airfoil parametric technique has significant limitation in modern aerodynamic optimization design.There is a strong demand for developing a parametric method with good intuitiveness, flexibility and representative accuracy. In this paper, two parametric generative schemes based on deep learning methods are proposed to represent the complicate design space under specific constraints. 1. Soft-constrained scheme: The CVAE-based model trains geometric constraints as part of the network and can provide constrained airfoil synthesis; 2. Hard-constrained scheme: The VAE-based model serves to generate diverse airfoils, while an FFD-based technique projects the generated airfoils to the final airfoils satisfying the given constraints. The statistical results show that the reconstructed airfoils are accurate and smooth without extra filters. The soft constrained scheme tend to synthesize and explore airfoils efficiently and effectively, concentrating to the reference airfoil in both geometry space and objective space. The constraints will loose for a little bit because the inherent property of the model. The hard constrained scheme tend to generate and explore airfoils in a wider range for both geometry space and objective space, and the distribution in objective space is closer to normal distribution. The synthesized airfoils through this scheme strictly conform with constraints, though the projection may produce some odd airfoil shapes.  ( 2 min )
    Response Component Analysis for Sea State Estimation Using Artificial Neural Networks and Vessel Response Spectral Data. (arXiv:2205.02375v1 [cs.LG])
    The use of the `ship as a wave buoy analogy' (SAWB) provides a novel means to estimate sea states, where relationships are established between causal wave properties and vessel motion response information. This study focuses on a model-free machine learning approach to SAWB-based sea state estimation (SSE), using neural networks (NNs) to map vessel response spectral data to statistical wave properties. Results showed a strong correlation between heave responses and significant wave height estimates, whilst the accuracy of mean wave period and wave heading predictions were observed to improve considerably when data from multiple vessel degrees of freedom (DOFs) was utilized. Overall, 3-DOF (heave, pitch and roll) NNs for SSE were shown to perform well when compared to existing SSE approaches that use similar simulation setups. Given the information-dense statistical representation of vessel motion responses in spectral form, as well as the ability of NNs to effectively model complex relationships between variables, the designed SSE method shows promise for future adaptation to mobile SSE systems using the SAWB approach.  ( 2 min )
    Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks. (arXiv:2205.02468v1 [cs.LG])
    Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline, where the student model extracts knowledge from a powerful teacher model to improve its performance. However, a pre-trained teacher model is not always accessible due to training cost, privacy, etc. In this paper, we propose a novel online knowledge distillation framework to resolve this problem. Specifically, each student GNN model learns the extracted local structure from another simultaneously trained counterpart in an alternating training procedure. We further develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model, which theoretically makes the structure information spread over all layers. Experimental results on five datasets including PPI, Coauthor-CS/Physics and Amazon-Computer/Photo demonstrate that the student performance is consistently boosted in our collaborative training framework without the supervision of a pre-trained teacher model. In addition, we also find that our alignahead technique can accelerate the model convergence speed and its effectiveness can be generally improved by increasing the student numbers in training. Code is available: https://github.com/GuoJY-eatsTG/Alignahead  ( 2 min )
    DeepExtrema: A Deep Learning Approach for Forecasting Block Maxima in Time Series Data. (arXiv:2205.02441v1 [cs.LG])
    Accurate forecasting of extreme values in time series is critical due to the significant impact of extreme events on human and natural systems. This paper presents DeepExtrema, a novel framework that combines a deep neural network (DNN) with generalized extreme value (GEV) distribution to forecast the block maximum value of a time series. Implementing such a network is a challenge as the framework must preserve the inter-dependent constraints among the GEV model parameters even when the DNN is initialized. We describe our approach to address this challenge and present an architecture that enables both conditional mean and quantile prediction of the block maxima. The extensive experiments performed on both real-world and synthetic data demonstrated the superiority of DeepExtrema compared to other baseline methods.  ( 2 min )
    A Deep Learning Approach to Dst Index Prediction. (arXiv:2205.02447v1 [cs.LG])
    The disturbance storm time (Dst) index is an important and useful measurement in space weather research. It has been used to characterize the size and intensity of a geomagnetic storm. A negative Dst value means that the Earth's magnetic field is weakened, which happens during storms. In this paper, we present a novel deep learning method, called the Dst Transformer, to perform short-term, 1-6 hour ahead, forecasting of the Dst index based on the solar wind parameters provided by the NASA Space Science Data Coordinated Archive. The Dst Transformer combines a multi-head attention layer with Bayesian inference, which is capable of quantifying both aleatoric uncertainty and epistemic uncertainty when making Dst predictions. Experimental results show that the proposed Dst Transformer outperforms related machine learning methods in terms of the root mean square error and R-squared. Furthermore, the Dst Transformer can produce both data and model uncertainty quantification results, which can not be done by the existing methods. To our knowledge, this is the first time that Bayesian deep learning has been used for Dst index forecasting.  ( 2 min )
    Uncertainty-Based Non-Parametric Active Peak Detection. (arXiv:2205.02376v1 [cs.IT])
    Active, non-parametric peak detection is considered. As a use case, active source localization is examined and an uncertainty-based sampling scheme algorithm to effectively localize the peak from a few energy measurements is designed. It is shown that under very mild conditions, the source localization error with $m$ actively chosen energy measurements scales as $O(\log^2 m/m)$. Numerically, it is shown that in low-sample regimes, the proposed method enjoys superior performance on several types of data and outperforms the state-of-the-art passive source localization approaches and in the low sample regime, can outperform greedy methods as well.  ( 2 min )
    GitRank: A Framework to Rank GitHub Repositories. (arXiv:2205.02360v1 [cs.SE])
    Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems to solve problems in software engineering. Open-source repositories could be of varying quality levels, and bad-quality repositories could degrade performance of these systems. Evaluating quality of open-source repositories, which is not available directly on code hosting sites such as GitHub, is thus important. In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria. We discuss our findings and preliminary evaluation in this hackathon report.  ( 2 min )
    KGTuner: Efficient Hyper-parameter Search for Knowledge Graph Learning. (arXiv:2205.02460v1 [cs.LG])
    While hyper-parameters (HPs) are important for knowledge graph (KG) learning, existing methods fail to search them efficiently. To solve this problem, we first analyze the properties of different HPs and measure the transfer ability from small subgraph to the full graph. Based on the analysis, we propose an efficient two-stage search algorithm KGTuner, which efficiently explores HP configurations on small subgraph at the first stage and transfers the top-performed configurations for fine-tuning on the large full graph at the second stage. Experiments show that our method can consistently find better HPs than the baseline algorithms within the same time budget, which achieves {9.1\%} average relative improvement for four embedding models on the large-scale KGs in open graph benchmark.  ( 2 min )
    Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video Services. (arXiv:2205.02446v1 [cs.AI])
    Recently, industrial recommendation services have been boosted by the continual upgrade of deep learning methods. However, they still face de-biasing challenges such as exposure bias and cold-start problem, where circulations of machine learning training on human interaction history leads algorithms to repeatedly suggest exposed items while ignoring less-active ones. Additional problems exist in multi-scenario platforms, e.g. appropriate data fusion from subsidiary scenarios, which we observe could be alleviated through graph structured data integration via message passing. In this paper, we present a multi-graph structured multi-scenario recommendation solution, which encapsulates interaction data across scenarios with multi-graph and obtains representation via graph learning. Extensive offline and online experiments on real-world datasets are conducted where the proposed method demonstrates an increase of 0.63% and 0.71% in CTR and Video Views per capita on new users over deployed set of baselines and outperforms regular method in increasing the number of outer-scenario videos by 25% and video watches by 116%, validating its superiority in activating cold videos and enriching target recommendation.  ( 2 min )
    Convolutional and Residual Networks Provably Contain Lottery Tickets. (arXiv:2205.02343v1 [cs.LG])
    The Lottery Ticket Hypothesis continues to have a profound practical impact on the quest for small scale deep neural networks that solve modern deep learning tasks at competitive performance. These lottery tickets are identified by pruning large randomly initialized neural networks with architectures that are as diverse as their applications. Yet, theoretical insights that attest their existence have been mostly focused on deep fully-connected feed forward networks with ReLU activation functions. We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability.  ( 2 min )
    KenSwQuAD -- A Question Answering Dataset for Swahili Low Resource Language. (arXiv:2205.02364v1 [cs.CL])
    This research developed a Kencorpus Swahili Question Answering Dataset KenSwQuAD from raw data of Swahili language, which is a low resource language predominantly spoken in Eastern African and also has speakers in other parts of the world. Question Answering datasets are important for machine comprehension of natural language processing tasks such as internet search and dialog systems. However, before such machine learning systems can perform these tasks, they need training data such as the gold standard Question Answering (QA) set that is developed in this research. The research engaged annotators to formulate question answer pairs from Swahili texts that had been collected by the Kencorpus project, a Kenyan languages corpus that collected data from three Kenyan languages. The total Swahili data collection had 2,585 texts, out of which we annotated 1,445 story texts with at least 5 QA pairs each, resulting into a final dataset of 7,526 QA pairs. A quality assurance set of 12.5% of the annotated texts was subjected to re-evaluation by different annotators who confirmed that the QA pairs were all correctly annotated. A proof of concept on applying the set to machine learning on the question answering task confirmed that the dataset can be used for such practical tasks. The research therefore developed KenSwQuAD, a question-answer dataset for Swahili that is useful to the natural language processing community who need training and gold standard sets for their machine learning applications. The research also contributed to the resourcing of the Swahili language which is important for communication around the globe. Updating this set and providing similar sets for other low resource languages is an important research area that is worthy of further research.  ( 3 min )
    Machine Learning Operations (MLOps): Overview, Definition, and Architecture. (arXiv:2205.02302v1 [cs.LG])
    The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize ML products and thus many ML endeavors fail to deliver on their expectations. The paradigm of Machine Learning Operations (MLOps) addresses this issue. MLOps includes several aspects, such as best practices, sets of concepts, and development culture. However, MLOps is still a vague term and its consequences for researchers and professionals are ambiguous. To address this gap, we conduct mixed-method research, including a literature review, a tool review, and expert interviews. As a result of these investigations, we provide an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows. Furthermore, we furnish a definition of MLOps and highlight open challenges in the field. Finally, this work provides guidance for ML researchers and practitioners who want to automate and operate their ML products with a designated set of technologies.  ( 2 min )
    Knowledge Distillation of Russian Language Models with Reduction of Vocabulary. (arXiv:2205.02340v1 [cs.CL])
    Today, transformer language models serve as a core component for majority of natural language processing tasks. Industrial application of such models requires minimization of computation time and memory footprint. Knowledge distillation is one of approaches to address this goal. Existing methods in this field are mainly focused on reducing the number of layers or dimension of embeddings/hidden representations. Alternative option is to reduce the number of tokens in vocabulary and therefore the embeddings matrix of the student model. The main problem with vocabulary minimization is mismatch between input sequences and output class distributions of a teacher and a student models. As a result, it is impossible to directly apply KL-based knowledge distillation. We propose two simple yet effective alignment techniques to make knowledge distillation to the students with reduced vocabulary. Evaluation of distilled models on a number of common benchmarks for Russian such as Russian SuperGLUE, SberQuAD, RuSentiment, ParaPhaser, Collection-3 demonstrated that our techniques allow to achieve compression from $17\times$ to $49\times$, while maintaining quality of $1.7\times$ compressed student with the full-sized vocabulary, but reduced number of Transformer layers only. We make our code and distilled models available.  ( 2 min )
    Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance. (arXiv:2205.02293v1 [cs.CL])
    Human-translated text displays distinct features from naturally written text in the same language. This phenomena, known as translationese, has been argued to confound the machine translation (MT) evaluation. Yet, we find that existing work on translationese neglects some important factors and the conclusions are mostly correlational but not causal. In this work, we collect CausalMT, a dataset where the MT training data are also labeled with the human translation directions. We inspect two critical factors, the train-test direction match (whether the human translation directions in the training and test sets are aligned), and data-model direction match (whether the model learns in the same direction as the human translation direction in the dataset). We show that these two factors have a large causal effect on the MT performance, in addition to the test-model direction mismatch highlighted by existing work on the impact of translationese. In light of our findings, we provide a set of suggestions for MT training and evaluation. Our code and data are at https://github.com/EdisonNi-hku/CausalMT  ( 2 min )
    Equity and Fairness of Bayesian Knowledge Tracing. (arXiv:2205.02333v1 [cs.LG])
    We consider the equity and fairness of curricula derived from Knowledge Tracing models. We begin by defining a unifying notion of an equitable tutoring system as a system that achieves maximum possible knowledge in minimal time for each student interacting with it. Realizing perfect equity requires tutoring systems that can provide individualized curricula per student. In particular, we investigate the design of equitable tutoring systems that derive their curricula from Knowledge Tracing models. We first show that many existing models, including classical Bayesian Knowledge Tracing (BKT) and Deep Knowledge Tracing (DKT), and their derived curricula can fall short of achieving equitable tutoring. To overcome this issue, we then propose a novel model, Bayesian-Bayesian Knowledge Tracing (BBKT), that naturally enables online individualization and, thereby, more equitable tutoring. We demonstrate that curricula derived from our model are more effective and equitable than those derived from classical BKT models. Furthermore, we highlight that improving models with a focus on the fairness of next-step predictions might be insufficient to develop equitable tutoring systems.  ( 2 min )
    Most Activation Functions Can Win the Lottery Without Excessive Depth. (arXiv:2205.02321v1 [cs.LG])
    The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning, which has inspired interesting practical and theoretical insights into how neural networks can represent functions. For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly initialized neural network that has double the target's depth $2L$ and is wider by a logarithmic factor. We show that a depth $L+1$ network is sufficient. This result indicates that we can expect to find lottery tickets at realistic, commonly used depths while only requiring logarithmic overparametrization. Our novel construction approach applies to a large class of activation functions and is not limited to ReLUs.  ( 2 min )
    An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms. (arXiv:2205.02273v1 [math.OC])
    Stochastic variance reduced methods have shown strong performance in solving finite-sum problems. However, these methods usually require the users to manually tune the step-size, which is time-consuming or even infeasible for some large-scale optimization tasks. To overcome the problem, we propose and analyze several novel adaptive variants of the popular SAGA algorithm. Eventually, we design a variant of Barzilai-Borwein step-size which is tailored for the incremental gradient method to ensure memory efficiency and fast convergence. We establish its convergence guarantees under general settings that allow non-Euclidean norms in the definition of smoothness and the composite objectives, which cover a broad range of applications in machine learning. We improve the analysis of SAGA to support non-Euclidean norms, which fills the void of existing work. Numerical experiments on standard datasets demonstrate a competitive performance of the proposed algorithm compared with existing variance-reduced methods and their adaptive variants.  ( 2 min )
    Language Models in the Loop: Incorporating Prompting into Weak Supervision. (arXiv:2205.02318v1 [cs.LG])
    We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.  ( 2 min )
    Multivariate Prediction Intervals for Random Forests. (arXiv:2205.02260v1 [stat.ML])
    Accurate uncertainty estimates can significantly improve the performance of iterative design of experiments, as in Sequential and Reinforcement learning. For many such problems in engineering and the physical sciences, the design task depends on multiple correlated model outputs as objectives and/or constraints. To better solve these problems, we propose a recalibrated bootstrap method to generate multivariate prediction intervals for bagged models and show that it is well-calibrated. We apply the recalibrated bootstrap to a simulated sequential learning problem with multiple objectives and show that it leads to a marked decrease in the number of iterations required to find a satisfactory candidate. This indicates that the recalibrated bootstrap could be a valuable tool for practitioners using machine learning to optimize systems with multiple competing targets.  ( 2 min )
    Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching. (arXiv:2205.02269v1 [cs.AR])
    Machine learning algorithms have shown potential to improve prefetching performance by accurately predicting future memory accesses. Existing approaches are based on the modeling of text prediction, considering prefetching as a classification problem for sequence prediction. However, the vast and sparse memory address space leads to large vocabulary, which makes this modeling impractical. The number and order of outputs for multiple cache line prefetching are also fundamentally different from text prediction. We propose TransFetch, a novel way to model prefetching. To reduce vocabulary size, we use fine-grained address segmentation as input. To predict unordered sets of future addresses, we use delta bitmaps for multiple outputs. We apply an attention-based network to learn the mapping between input and output. Prediction experiments demonstrate that address segmentation achieves 26% - 36% higher F1-score than delta inputs and 15% - 24% higher F1-score than page & offset inputs for SPEC 2006, SPEC 2017, and GAP benchmarks. Simulation results show that TransFetch achieves 38.75% IPC improvement compared with no prefetching, outperforming the best-performing rule-based prefetcher BOP by 10.44%, and ML-based prefetcher Voyager by 6.64%.  ( 2 min )
    Group-Invariant Quantum Machine Learning. (arXiv:2205.02261v1 [quant-ph])
    Quantum Machine Learning (QML) models are aimed at learning from data encoded in quantum states. Recently, it has been shown that models with little to no inductive biases (i.e., with no assumptions about the problem embedded in the model) are likely to have trainability and generalization issues, especially for large problem sizes. As such, it is fundamental to develop schemes that encode as much information as available about the problem at hand. In this work we present a simple, yet powerful, framework where the underlying invariances in the data are used to build QML models that, by construction, respect those symmetries. These so-called group-invariant models produce outputs that remain invariant under the action of any element of the symmetry group $\mathfrak{G}$ associated to the dataset. We present theoretical results underpinning the design of $\mathfrak{G}$-invariant models, and exemplify their application through several paradigmatic QML classification tasks including cases when $\mathfrak{G}$ is a continuous Lie group and also when it is a discrete symmetry group. Notably, our framework allows us to recover, in an elegant way, several well known algorithms for the literature, as well as to discover new ones. Taken together, we expect that our results will help pave the way towards a more geometric and group-theoretic approach to QML model design.  ( 2 min )
    Learning Individual Interactions from Population Dynamics with Discrete-Event Simulation Model. (arXiv:2205.02332v1 [cs.LG])
    The abundance of data affords researchers to pursue more powerful computational tools to learn the dynamics of complex system, such as neural networks, engineered systems and social networks. Traditional machine learning approaches capture complex system dynamics either with dynamic Bayesian networks and state space models, which is hard to scale because it is non-trivial to prescribe the dynamics with a sparse graph or a system of differential equations; or a deep neural networks, where the distributed representation of the learned dynamics is hard to interpret. In this paper, we will explore the possibility of learning a discrete-event simulation representation of complex system dynamics assuming multivariate normal distribution of the state variables, based on the observation that many complex system dynamics can be decomposed into a sequence of local interactions, which individually change the system state only minimally but in sequence generate complex and diverse dynamics. Our results show that the algorithm can data-efficiently capture complex network dynamics in several fields with meaningful events.  ( 2 min )
    Minimum Cost Intervention Design for Causal Effect Identification. (arXiv:2205.02232v1 [cs.LG])
    Pearl's do calculus is a complete axiomatic approach to learn the identifiable causal effects from observational data. When such an effect is not identifiable, it is necessary to perform a collection of often costly interventions in the system to learn the causal effect. In this work, we consider the problem of designing the collection of interventions with the minimum cost to identify the desired effect. First, we prove that this problem is NP-hard, and subsequently propose an algorithm that can either find the optimal solution or a logarithmic-factor approximation of it. This is done by establishing a connection between our problem and the minimum hitting set problem. Additionally, we propose several polynomial-time heuristic algorithms to tackle the computational complexity of the problem. Although these algorithms could potentially stumble on sub-optimal solutions, our simulations show that they achieve small regrets on random graphs.  ( 2 min )
  • Open

    Fitting an immersed submanifold to data via Sussmann's orbit theorem. (arXiv:2204.01119v2 [cs.LG] UPDATED)
    This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples. The reconstruction mapping from the ambient space to the desired submanifold is implemented as a composition of an encoder that maps each point to a tuple of (positive or negative) times and a decoder given by a composition of flows along finitely many vector fields starting from a fixed initial point. The encoder supplies the times for the flows. The encoder-decoder map is obtained by empirical risk minimization, and a high-probability bound is given on the excess risk relative to the minimum expected reconstruction error over a given class of encoder-decoder maps. The proposed approach makes fundamental use of Sussmann's orbit theorem, which guarantees that the image of the reconstruction map is indeed contained in an immersed submanifold.  ( 2 min )
    Tracking the risk of a deployed model and detecting harmful distribution shifts. (arXiv:2110.06177v4 [stat.ML] UPDATED)
    When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model retraining) unnecessary. While several works have developed tests for distribution shifts, these typically either use non-sequential methods, or detect arbitrary shifts (benign or harmful), or both. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate. In this work, we design simple sequential tools for testing if the difference between source (training) and target (test) distributions leads to a significant increase in a risk function of interest, like accuracy or calibration. Recent advances in constructing time-uniform confidence sequences allow efficient aggregation of statistical evidence accumulated during the tracking process. The designed framework is applicable in settings where (some) true labels are revealed after the prediction is performed, or when batches of labels become available in a delayed fashion. We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets.  ( 2 min )
    Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models. (arXiv:2110.14993v2 [cs.LG] UPDATED)
    We study prediction of future outcomes with supervised models that use privileged information during learning. The privileged information comprises samples of time series observed between the baseline time of prediction and the future outcome; this information is only available at training time which differs from the traditional supervised learning. Our question is when using this privileged data leads to more sample-efficient learning of models that use only baseline data for predictions at test time. We give an algorithm for this setting and prove that when the time series are drawn from a non-stationary Gaussian-linear dynamical system of fixed horizon, learning with privileged information is more efficient than learning without it. On synthetic data, we test the limits of our algorithm and theory, both when our assumptions hold and when they are violated. On three diverse real-world datasets, we show that our approach is generally preferable to classical learning, particularly when data is scarce. Finally, we relate our estimator to a distillation approach both theoretically and empirically.  ( 2 min )
    Local Latin Hypercube Refinement for Multi-objective Design Uncertainty Optimization. (arXiv:2108.08890v2 [stat.ML] UPDATED)
    Optimizing the reliability and the robustness of a design is important but often unaffordable due to high sample requirements. Surrogate models based on statistical and machine learning methods are used to increase the sample efficiency. However, for higher dimensional or multi-modal systems, surrogate models may also require a large amount of samples to achieve good results. We propose a sequential sampling strategy for the surrogate based solution of multi-objective reliability based robust design optimization problems. Proposed local Latin hypercube refinement (LoLHR) strategy is model-agnostic and can be combined with any surrogate model because there is no free lunch but possibly a budget one. The proposed method is compared to stationary sampling as well as other proposed strategies from the literature. Gaussian process and support vector regression are both used as surrogate models. Empirical evidence is presented, showing that LoLHR achieves on average better results compared to other surrogate based strategies on the tested examples.  ( 2 min )
    Dropout Strikes Back: Improved Uncertainty Estimation via Diversity Sampling. (arXiv:2003.03274v3 [cs.LG] UPDATED)
    Uncertainty estimation for machine learning models is of high importance in many scenarios such as constructing the confidence intervals for model predictions and detection of out-of-distribution or adversarially generated points. In this work, we show that modifying the sampling distributions for dropout layers in neural networks improves the quality of uncertainty estimation. Our main idea consists of two main steps: computing data-driven correlations between neurons and generating samples, which include maximally diverse neurons. In a series of experiments on simulated and real-world data, we demonstrate that the diversification via determinantal point processes-based sampling achieves state-of-the-art results in uncertainty estimation for regression and classification tasks. An important feature of our approach is that it does not require any modification to the models or training procedures, allowing straightforward application to any deep learning model with dropout layers.  ( 2 min )
    Partition MCMC for inference on acyclic digraphs. (arXiv:1504.05006v2 [stat.ML] CROSS LISTED)
    Acyclic digraphs are the underlying representation of Bayesian networks, a widely used class of probabilistic graphical models. Learning the underlying graph from data is a way of gaining insights about the structural properties of a domain. Structure learning forms one of the inference challenges of statistical graphical models. MCMC methods, notably structure MCMC, to sample graphs from the posterior distribution given the data are probably the only viable option for Bayesian model averaging. Score modularity and restrictions on the number of parents of each node allow the graphs to be grouped into larger collections, which can be scored as a whole to improve the chain's convergence. Current examples of algorithms taking advantage of grouping are the biased order MCMC, which acts on the alternative space of permuted triangular matrices, and non ergodic edge reversal moves. Here we propose a novel algorithm, which employs the underlying combinatorial structure of DAGs to define a new grouping. As a result convergence is improved compared to structure MCMC, while still retaining the property of producing an unbiased sample. Finally the method can be combined with edge reversal moves to improve the sampler further.  ( 2 min )
    Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks. (arXiv:2202.05258v2 [cs.LG] UPDATED)
    We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ. Prior work hinted at the impossibility of our result: Vempala and Wilmes showed that general SQ lower bounds cannot apply to any real-valued family of functions that satisfies a simple non-degeneracy condition. To circumvent their result, we refine a lifting procedure due to Daniely and Vardi that reduces Boolean PAC learning problems to Gaussian ones. We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction. As such, we also prove new cryptographic hardness results for PAC learning two-hidden-layer ReLU networks, as well as new lower bounds for learning constant-depth ReLU networks from label queries.  ( 2 min )
    A Change Dynamic Model for the Online Detection of Gradual Change. (arXiv:2205.01054v3 [stat.ML] UPDATED)
    Changes in the statistical properties of a stochastic process are typically assumed to occur via change-points, which demark instantaneous moments of complete and total change in process behavior. In cases where these transitions occur gradually, this assumption can result in a reduced ability to properly identify and respond to process change. With this observation in mind, we introduce a novel change-dynamic model for the online detection of gradual change in a Bayesian framework, in which change-points are used within a hierarchical model to indicate moments of gradual change onset or termination. We apply this model to synthetic data and EEG readings drawn during epileptic seizure, where we find our change-dynamic model can enable faster and more accurate identification of gradual change than traditional change-point models allow.  ( 2 min )
    Linear Discriminant Analysis with High-dimensional Mixed Variables. (arXiv:2112.07145v2 [stat.ME] UPDATED)
    Datasets containing both categorical and continuous variables are frequently encountered in many areas, and with the rapid development of modern measurement technologies, the dimensions of these variables can be very high. Despite the recent progress made in modelling high-dimensional data for continuous variables, there is a scarcity of methods that can deal with a mixed set of variables. To fill this gap, this paper develops a novel approach for classifying high-dimensional observations with mixed variables. Our framework builds on a location model, in which the distributions of the continuous variables conditional on categorical ones are assumed Gaussian. We overcome the challenge of having to split data into exponentially many cells, or combinations of the categorical variables, by kernel smoothing, and provide new perspectives for its bandwidth choice to ensure an analogue of Bochner's Lemma, which is different to the usual bias-variance tradeoff. We show that the two sets of parameters in our model can be separately estimated and provide penalized likelihood for their estimation. Results on the estimation accuracy and the misclassification rates are established, and the competitive performance of the proposed classifier is illustrated by extensive simulation and real data studies.  ( 2 min )
    Approximate exploitability: Learning a best response in large games. (arXiv:2004.09677v4 [cs.LG] UPDATED)
    Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZero-based agents.  ( 2 min )
    Addendum on the scoring of Gaussian directed acyclic graphical models. (arXiv:1402.6863v4 [stat.ML] CROSS LISTED)
    We provide a correction to the expression for scoring Gaussian directed acyclic graphical models derived in Geiger and Heckerman [Ann. Statist. 30 (2002) 1414-1440] and discuss how to evaluate the score efficiently.  ( 2 min )
    Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks. (arXiv:2102.11010v3 [cs.LG] UPDATED)
    We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. Saliency interpretations of deterministic Neural Networks are remarkably brittle even when the attacks fail, i.e. for attacks that do not change the classification label. We empirically show that interpretations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations of the inputs and even under direct attacks to the explanations. By leveraging recent results, we also provide a theoretical explanation of this result in terms of the geometry of the data manifold. Additionally, we discuss the stability of the interpretations of high level representations of the inputs in the internal layers of a Network. Our results demonstrate that Bayesian methods, in addition to being more robust to adversarial attacks, have the potential to provide more stable and interpretable assessments of Neural Network predictions.  ( 2 min )
    Communication-Efficient Device Scheduling for Federated Learning Using Stochastic Optimization. (arXiv:2201.07912v2 [cs.LG] UPDATED)
    Federated learning (FL) is a useful tool in distributed machine learning that utilizes users' local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment; however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection quality, and non-i.i.d. data. In this paper, we provide a novel convergence analysis of non-convex loss functions using FL on both i.i.d. and non-i.i.d. datasets with arbitrary device selection probabilities for each round. Then, using the derived convergence bound, we use stochastic optimization to develop a new client selection and power allocation algorithm that minimizes a function of the convergence bound and the average communication time under a transmit power constraint. We find an analytical solution to the minimization problem. One key feature of the algorithm is that knowledge of the channel statistics is not required and only the instantaneous channel state information needs to be known. Using the FEMNIST and CIFAR-10 datasets, we show through simulations that the communication time can be significantly decreased using our algorithm, compared to uniformly random participation.  ( 2 min )
    Riemannian classification of EEG signals with missing values. (arXiv:2110.10011v2 [cs.HC] UPDATED)
    This paper proposes a strategy to handle missing data for the classification of electroencephalograms using covariance matrices. It relies on the observed-data likelihood within an expectation-maximization algorithm. This approach is compared to two existing state-of-the-art methods: (i) covariance matrices computed with imputed data; (ii) Riemannian averages of partially observed covariance matrix. All approaches are combined with the minimum distance to Riemannian mean classifier and applied to a classification task of two widely known paradigms of brain-computer interfaces. In addition to be applicable for a wider range of missing data scenarios, the proposed strategy generally performs better than other methods on the considered real EEG data.  ( 2 min )
    Non-Euclidean Differentially Private Stochastic Convex Optimization: Optimal Rates in Linear Time. (arXiv:2103.01278v2 [cs.LG] UPDATED)
    Differentially private (DP) stochastic convex optimization (SCO) is a fundamental problem, where the goal is to approximately minimize the population risk with respect to a convex loss function, given a dataset of $n$ i.i.d. samples from a distribution, while satisfying differential privacy with respect to the dataset. Most of the existing works in the literature of private convex optimization focus on the Euclidean (i.e., $\ell_2$) setting, where the loss is assumed to be Lipschitz (and possibly smooth) w.r.t. the $\ell_2$ norm over a constraint set with bounded $\ell_2$ diameter. Algorithms based on noisy stochastic gradient descent (SGD) are known to attain the optimal excess risk in this setting. In this work, we conduct a systematic study of DP-SCO for $\ell_p$-setups under a standard smoothness assumption on the loss. For $1< p\leq 2$, under a standard smoothness assumption, we give a new, linear-time DP-SCO algorithm with optimal excess risk. Previously known constructions with optimal excess risk for $1< p <2$ run in super-linear time in $n$. For $p=1$, we give an algorithm with nearly optimal excess risk. Our result for the $\ell_1$-setup also extends to general polyhedral norms and feasible sets. Moreover, we show that the excess risk bounds resulting from our algorithms for $1\leq p \leq 2$ are attained with high probability. For $2 < p \leq \infty$, we show that existing linear-time constructions for the Euclidean setup attain a nearly optimal excess risk in the low-dimensional regime. As a consequence, we show that such constructions attain a nearly optimal excess risk for $p=\infty$. Our work draws upon concepts from the geometry of normed spaces, such as the notions of regularity, uniform convexity, and uniform smoothness.  ( 3 min )
    Is Pessimism Provably Efficient for Offline RL?. (arXiv:2012.15085v3 [cs.LG] UPDATED)
    We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the "best effort" among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the "irrelevant" trajectories that are less covered by the dataset and not informative for the optimal policy.  ( 2 min )
    Automated Imbalanced Classification via Layered Learning. (arXiv:2205.02553v1 [cs.LG])
    In this paper we address imbalanced binary classification (IBC) tasks. Applying resampling strategies to balance the class distribution of training instances is a common approach to tackle these problems. Many state-of-the-art methods find instances of interest close to the decision boundary to drive the resampling process. However, under-sampling the majority class may potentially lead to important information loss. Over-sampling also may increase the chance of overfitting by propagating the information contained in instances from the minority class. The main contribution of our work is a new method called ICLL for tackling IBC tasks which is not based on resampling training observations. Instead, ICLL follows a layered learning paradigm to model the data in two stages. In the first layer, ICLL learns to distinguish cases close to the decision boundary from cases which are clearly from the majority class, where this dichotomy is defined using a hierarchical clustering analysis. In the subsequent layer, we use instances close to the decision boundary and instances from the minority class to solve the original predictive task. A second contribution of our work is the automatic definition of the layers which comprise the layered learning strategy using a hierarchical clustering model. This is a relevant discovery as this process is usually performed manually according to domain knowledge. We carried out extensive experiments using 100 benchmark data sets. The results show that the proposed method leads to a better performance relatively to several state-of-the-art methods for IBC.  ( 2 min )
    Holistic Approach to Measure Sample-level Adversarial Vulnerability and its Utility in Building Trustworthy Systems. (arXiv:2205.02604v1 [cs.CV])
    Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction. Recently, a few works showed inherent bias associated with such attack (robustness bias), where certain subgroups in a dataset (e.g. based on class, gender, etc.) are less robust than others. This bias not only persists even after adversarial training, but often results in severe performance discrepancies across these subgroups. Existing works characterize the subgroup's robustness bias by only checking individual sample's proximity to the decision boundary. In this work, we argue that this measure alone is not sufficient and validate our argument via extensive experimental analysis. It has been observed that adversarial attacks often corrupt the high-frequency components of the input image. We, therefore, propose a holistic approach for quantifying adversarial vulnerability of a sample by combining these different perspectives, i.e., degree of model's reliance on high-frequency features and the (conventional) sample-distance to the decision boundary. We demonstrate that by reliably estimating adversarial vulnerability at the sample level using the proposed holistic metric, it is possible to develop a trustworthy system where humans can be alerted about the incoming samples that are highly likely to be misclassified at test time. This is achieved with better precision when our holistic metric is used over individual measures. To further corroborate the utility of the proposed holistic approach, we perform knowledge distillation in a limited-sample setting. We observe that the student network trained with the subset of samples selected using our combined metric performs better than both the competing baselines, viz., where samples are selected randomly or based on their distances to the decision boundary.  ( 2 min )
    Quantum Extremal Learning. (arXiv:2205.02807v1 [quant-ph])
    We propose a quantum algorithm for `extremal learning', which is the process of finding the input to a hidden function that extremizes the function output, without having direct access to the hidden function, given only partial input-output (training) data. The algorithm, called quantum extremal learning (QEL), consists of a parametric quantum circuit that is variationally trained to model data input-output relationships and where a trainable quantum feature map, that encodes the input data, is analytically differentiated in order to find the coordinate that extremizes the model. This enables the combination of established quantum machine learning modelling with established quantum optimization, on a single circuit/quantum computer. We have tested our algorithm on a range of classical datasets based on either discrete or continuous input variables, both of which are compatible with the algorithm. In case of discrete variables, we test our algorithm on synthetic problems formulated based on Max-Cut problem generators and also considering higher order correlations in the input-output relationships. In case of the continuous variables, we test our algorithm on synthetic datasets in 1D and simple ordinary differential functions. We find that the algorithm is able to successfully find the extremal value of such problems, even when the training dataset is sparse or a small fraction of the input configuration space. We additionally show how the algorithm can be used for much more general cases of higher dimensionality, complex differential equations, and with full flexibility in the choice of both modeling and optimization ansatz. We envision that due to its general framework and simple construction, the QEL algorithm will be able to solve a wide variety of applications in different fields, opening up areas of further research.  ( 2 min )
    Communication-Efficient Adaptive Federated Learning. (arXiv:2205.02719v1 [cs.LG])
    Federated learning is a machine learning training paradigm that enables clients to jointly train models without sharing their own localized data. However, the implementation of federated learning in practice still faces numerous challenges, such as the large communication overhead due to the repetitive server-client synchronization and the lack of adaptivity by SGD-based model updates. Despite that various methods have been proposed for reducing the communication cost by gradient compression or quantization, and the federated versions of adaptive optimizers such as FedAdam are proposed to add more adaptivity, the current federated learning framework still cannot solve the aforementioned challenges all at once. In this paper, we propose a novel communication-efficient adaptive federated learning method (FedCAMS) with theoretical convergence guarantees. We show that in the nonconvex stochastic optimization setting, our proposed FedCAMS achieves the same convergence rate of $O(\frac{1}{\sqrt{TKm}})$ as its non-compressed counterparts. Extensive experiments on various benchmarks verify our theoretical analysis.  ( 2 min )
    Bivariate vine copula based quantile regression. (arXiv:2205.02557v1 [stat.ME])
    The statistical analysis of univariate quantiles is a well developed research topic. However, there is a profound need for research in multivariate quantiles. We tackle the topic of bivariate quantiles and bivariate quantile regression using vine copulas. They are graph theoretical models identified by a sequence of linked trees, which allow for separate modelling of marginal distributions and the dependence structure. We introduce a novel graph structure model (given by a tree sequence) specifically designed for a symmetric treatment of two responses in a predictive regression setting. We establish computational tractability of the model and a straight forward way of obtaining different conditional distributions. Using vine copulas the typical shortfalls of regression, as the need for transformations or interactions of predictors, collinearity or quantile crossings are avoided. We illustrate the copula based bivariate quantiles for different copula distributions and provide a data set example. Further, the data example emphasizes the benefits of the joint bivariate response modelling in contrast to two separate univariate regressions or by assuming conditional independence, for bivariate response data set in the presence of conditional dependence.  ( 2 min )
    The interventional Bayesian Gaussian equivalent score for Bayesian causal inference with unknown soft interventions. (arXiv:2205.02602v1 [stat.ME])
    Describing the causal relations governing a system is a fundamental task in many scientific fields, ideally addressed by experimental studies. However, obtaining data under intervention scenarios may not always be feasible, while discovering causal relations from purely observational data is notoriously challenging. In certain settings, such as genomics, we may have data from heterogeneous study conditions, with soft (partial) interventions only pertaining to a subset of the study variables, whose effects and targets are possibly unknown. Combining data from experimental and observational studies offers the opportunity to leverage both domains and improve on the identifiability of causal structures. To this end, we define the interventional BGe score for a mixture of observational and interventional data, where the targets and effects of intervention may be unknown. To demonstrate the approach we compare its performance to other state-of-the-art algorithms, both in simulations and data analysis applications. Prerogative of our method is that it takes a Bayesian perspective leading to a full characterisation of the posterior distribution of the DAG structures. Given a sample of DAGs one can also automatically derive full posterior distributions of the intervention effects. Consequently the method effectively captures the uncertainty both in the structure and the parameter estimates. Codes to reproduce the simulations and analyses are publicly available at github.com/jackkuipers/iBGe  ( 2 min )
    Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning. (arXiv:2205.02450v1 [cs.LG])
    Dynamic mechanism design has garnered significant attention from both computer scientists and economists in recent years. By allowing agents to interact with the seller over multiple rounds, where agents' reward functions may change with time and are state dependent, the framework is able to model a rich class of real world problems. In these works, the interaction between agents and sellers are often assumed to follow a Markov Decision Process (MDP). We focus on the setting where the reward and transition functions of such an MDP are not known a priori, and we are attempting to recover the optimal mechanism using an a priori collected data set. In the setting where the function approximation is employed to handle large state spaces, with only mild assumptions on the expressiveness of the function class, we are able to design a dynamic mechanism using offline reinforcement learning algorithms. Moreover, learned mechanisms approximately have three key desiderata: efficiency, individual rationality, and truthfulness. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set. To the best of our knowledge, our work provides the first offline RL algorithm for dynamic mechanism design without assuming uniform coverage.  ( 2 min )
    Generative methods for sampling transition paths in molecular dynamics. (arXiv:2205.02818v1 [stat.ML])
    Molecular systems often remain trapped for long times around some local minimum of the potential energy function, before switching to another one -- a behavior known as metastability. Simulating transition paths linking one metastable state to another one is difficult by direct numerical methods. In view of the promises of machine learning techniques, we explore in this work two approaches to more efficiently generate transition paths: sampling methods based on generative models such as variational autoencoders, and importance sampling methods based on reinforcement learning.  ( 2 min )
    Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent DAGs with Applications. (arXiv:2205.02654v1 [cs.LG])
    Counting and sampling directed acyclic graphs from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper we show that these tasks can be performed in polynomial time, solving a long-standing open problem in this area. Our algorithms are effective and easily implementable. As we show in experiments, these breakthroughs make thought-to-be-infeasible strategies in active learning of causal structures and causal effect identification with regard to a Markov equivalence class practically applicable.  ( 2 min )
    REDS: Rule Extraction for Discovering Scenarios. (arXiv:1910.01713v2 [cs.LG] UPDATED)
    Scenario discovery is the process of finding areas of interest, known as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions, i.e., inputs of the simulation model, where the system is unstable. Subgroup discovery methods are commonly used for scenario discovery. They find scenarios in the form of hyperboxes, which are easy to comprehend. Given a computational budget, results tend to get worse as the number of inputs of the simulation model and the cost of simulations increase. We propose a new procedure for scenario discovery from few simulations, dubbed REDS. A key ingredient is using an intermediate machine learning model to label data for subsequent use by conventional subgroup discovery methods. We provide statistical arguments why this is an improvement. In our experiments, REDS reduces the number of simulations required by 50--75\% on average, depending on the quality measure. It is also useful as a semi-supervised subgroup discovery method and for discovering better scenarios from third-party data, when a simulation model is not available.  ( 2 min )
    Dynamic Bayesian Network Auxiliary ABC-SMC for Hybrid Model Bayesian Inference to Accelerate Biomanufacturing Process Mechanism Learning and Robust Control. (arXiv:2205.02410v1 [stat.ML])
    Driven by the critical needs of biomanufacturing 4.0, we present a probabilistic knowledge graph hybrid model characterizing complex spatial-temporal causal interdependencies of underlying bioprocessing mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given limited process observations, we derive a posterior distribution quantifying model uncertainty, which can facilitate mechanism learning and support robust process control. To avoid evaluation of intractable likelihood, Approximate Bayesian Computation sampling with Sequential Monte Carlo (ABC-SMC) is developed to approximate the posterior distribution. Given high stochastic and model uncertainties, it is computationally expensive to match process output trajectories. Therefore, we propose a linear Gaussian dynamic Bayesian network (LG-DBN) auxiliary likelihood-based ABC-SMC algorithm. Through matching observed and simulated summary statistics, the proposed approach can dramatically reduce the computation cost and accelerate the posterior approximation convergence.  ( 2 min )
    Group-Invariant Quantum Machine Learning. (arXiv:2205.02261v1 [quant-ph])
    Quantum Machine Learning (QML) models are aimed at learning from data encoded in quantum states. Recently, it has been shown that models with little to no inductive biases (i.e., with no assumptions about the problem embedded in the model) are likely to have trainability and generalization issues, especially for large problem sizes. As such, it is fundamental to develop schemes that encode as much information as available about the problem at hand. In this work we present a simple, yet powerful, framework where the underlying invariances in the data are used to build QML models that, by construction, respect those symmetries. These so-called group-invariant models produce outputs that remain invariant under the action of any element of the symmetry group $\mathfrak{G}$ associated to the dataset. We present theoretical results underpinning the design of $\mathfrak{G}$-invariant models, and exemplify their application through several paradigmatic QML classification tasks including cases when $\mathfrak{G}$ is a continuous Lie group and also when it is a discrete symmetry group. Notably, our framework allows us to recover, in an elegant way, several well known algorithms for the literature, as well as to discover new ones. Taken together, we expect that our results will help pave the way towards a more geometric and group-theoretic approach to QML model design.  ( 2 min )
    DeepBayes -- an estimator for parameter estimation in stochastic nonlinear dynamical models. (arXiv:2205.02264v1 [stat.ML])
    Stochastic nonlinear dynamical systems are ubiquitous in modern, real-world applications. Yet, estimating the unknown parameters of stochastic, nonlinear dynamical models remains a challenging problem. The majority of existing methods employ maximum likelihood or Bayesian estimation. However, these methods suffer from some limitations, most notably the substantial computational time for inference coupled with limited flexibility in application. In this work, we propose DeepBayes estimators that leverage the power of deep recurrent neural networks in learning an estimator. The method consists of first training a recurrent neural network to minimize the mean-squared estimation error over a set of synthetically generated data using models drawn from the model set of interest. The a priori trained estimator can then be used directly for inference by evaluating the network with the estimation data. The deep recurrent neural network architectures can be trained offline and ensure significant time savings during inference. We experiment with two popular recurrent neural networks -- long short term memory network (LSTM) and gated recurrent unit (GRU). We demonstrate the applicability of our proposed method on different example models and perform detailed comparisons with state-of-the-art approaches. We also provide a study on a real-world nonlinear benchmark problem. The experimental evaluations show that the proposed approach is asymptotically as good as the Bayes estimator.  ( 2 min )
    Multivariate Prediction Intervals for Random Forests. (arXiv:2205.02260v1 [stat.ML])
    Accurate uncertainty estimates can significantly improve the performance of iterative design of experiments, as in Sequential and Reinforcement learning. For many such problems in engineering and the physical sciences, the design task depends on multiple correlated model outputs as objectives and/or constraints. To better solve these problems, we propose a recalibrated bootstrap method to generate multivariate prediction intervals for bagged models and show that it is well-calibrated. We apply the recalibrated bootstrap to a simulated sequential learning problem with multiple objectives and show that it leads to a marked decrease in the number of iterations required to find a satisfactory candidate. This indicates that the recalibrated bootstrap could be a valuable tool for practitioners using machine learning to optimize systems with multiple competing targets.  ( 2 min )

  • Open

    Does anyone know of an app that can rewrite an entire document in one go to sound more professional?
    I know there's Grammarly but is there something that does it all in one go to make it more cohesive? ​ Thanks! submitted by /u/chriscarmy [link] [comments]  ( 1 min )
    IVY: An Open-Source Tool To Make Deep Learning Code Compatible Across Frameworks
    As ML aficionados, we’ve all come across interesting projects on GitHub only to discover that they are not in the framework we want and are familiar with. It can be tedious at times to reimplement the whole codebase in our framework, let alone deal with any errors that may arise throughout the process. It is a tedious chore that no one wants to do. Isn’t it good to have something that doesn’t care what framework you’re using? It will provide you with code in your desired framework, whether it is JAX, PyTorch, MXNet, Numpy, or TensorFlow. This is what IVY is attempting to do by unifying all ML frameworks. The number of open-source machine learning projects has surged significantly over the past. This is evident by the fast-growing number of Github repositories using the keyword Deep learning. Because of different frameworks, code sharability has been considerably hampered. Aside from that, many frameworks become obsolete in comparison to newer frameworks. For software development where collaboration is vital, this is a significant bottleneck. As newer frameworks come into the scene framework-specific code quickly becomes obsolete, and transferring code across frameworks is akin to reinventing the wheel. Continue Reading GitHub: https://github.com/unifyai/ivy Paper: https://arxiv.org/pdf/2102.02886.pdf Project: https://lets-unify.ai/ivy/ submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Any apps to have conversations with AI talking heads?
    I'm a software engineer and I've become interested in making an app where you can have a conversation with an AI generated talking head. I guess the goal would feel like having a videocall conversation with another human, except its all AI generated. TBH It wouldn't have to be human, even a cartoon character would be cool. The closest thing I can find suggests combining: Face Generation (StyleGAN), Text Generation (GPT), Text-To-Speech (FlowTron), and Lip-Sync Animation (LipGAN). https://medium.com/swlh/how-to-create-fake-talking-head-videos-with-deep-learning-code-tutorial-f9cfc0c19ab5 Does anybody know: Of any apps that are already doing something similar? How to achieve this in an app? Most tutorials seem to be Python which won't run on the frontend so wondering if there's anything Swift, Java, Kotlin, React Native etc based. Thanks submitted by /u/djames843 [link] [comments]  ( 1 min )
    Why won't AI ever replace humans in the Healthcare field? I guess we don't have the technology right now, but won't AI be a lot more accurate and safer to diagnose diseases, etc?
    submitted by /u/UmbraShield [link] [comments]  ( 1 min )
    OpenAI founder Sam Altman sees a big AI revolution within this decade
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Paintings.
    submitted by /u/cookingandcraft [link] [comments]
    Best Natural Language Processing Courses you might know 2022
    Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. Natural language processing (NLP) describes the interaction between human language and computers. It's a technology that many people use daily and has been around for years, but is often taken for granted. A few examples of NLP that people use every day are: Spell check. Looking gain your NLP skills then here is the Best Natural Language Processing Courses you might know in 2022 submitted by /u/Lakshmireddys [link] [comments]  ( 1 min )
    Meta's open-source new model OPT is GPT-3's closest competitor!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
  • Open

    [D] Any good podcast discussing ML papers?
    I was wondering if there are any good podcasts discussing recent papers in ML space? Something similar to Yannic's videos but in podcast format. The closest one I have found is "TWIML AI". submitted by /u/KeikakuAccelerator [link] [comments]  ( 1 min )
    [D] Did ICCV Dramatically Change its Template?
    I'm preparing to re-submit my paper to conference, and I'm looking at ICCV 2023. The call for papers provides a template here, but it doesn't look a thing like the template from the most-recently released proceedings, ICCV 2021. The page limit (15 pages!!) is also dramatically different. Am I looking at the same ICCV? If so, does anyone know what the impetus was for the change? I can't say the new one looks nicer, though I am happy about more generous space limits (only feels appropriate in the age where most of these papers will only ever be read digitally...) submitted by /u/relenzo [link] [comments]  ( 1 min )
    [D] How does SHAP handle removed features?
    I understood that SHAP computes predictions for each possible feature combination. But how can this be done simply passing clf.predict_proba in parameters? Does it retrain the model every time the number of features change for a combination? submitted by /u/savoga [link] [comments]  ( 1 min )
    API call-sequence and user-behaviour datasets “[R]” “[P]”
    Distributed micro-services based applications are typically accessed via APIs. These APIs are used either by apps or they can be accessed directly via programmatic means. Many a time API access is abused by attackers trying to exploit the business logic exposed by these APIs. The way normal users access these APIs is different from how the attackers access these APIs. Many applications have 100s of APIs that are called in specific order and depending on various factors such as browser refreshes, session refreshes, network errors, or programmatic access these behaviors are not static and can vary for the same user. API calls in long running sessions form access graphs that need to be analysed in order to discover attack patterns and anomalies. Graphs dont lend themselves to numerical computation. We address this issue and provide a dataset where user access behavior is qualified as numerical features. In addition we provide a dataset where raw API call graphs are provided. Supporting the use of these datasets two notebooks on classification, node embeddings and clustering are also provided. checkout the dataset and some notebooks that show how the data set can be used at https://www.kaggle.com/datasets/tangodelta/api-access-behaviour-anomaly-dataset submitted by /u/indra_gunt [link] [comments]  ( 1 min )
    [R] Looking for resources on hybrid modeling using known physics and machine learning
    For my master thesis (mechanical engineering) I'm working on a hybrid model of a robotic manipulator. The goal is to extend the existing, physics-based, model with a machine learning component that aims to learn the unmodelled dynamics. I've read about PINN's and PGNN's, but they don't quite seem to capture the methodology I'm looking for. To give a clearer direction, I want to develop a model similar to this paper. Thanks in advance. submitted by /u/MalleMinkukel [link] [comments]  ( 1 min )
    [R] Early Stoppable/Iterable Pre-trained Object Recognition Model
    Hey, I am currently working on my thesis in the field of microservices in mission-critical applications, and I am searching for some kind of applications (especially object recognition) that I can stop if a certain threshold is reached. At the point where the application stops, a certain result must already be present. Would be nice if anyone has an idea. Feel free to ask if you have any further questions. submitted by /u/mxsrv [link] [comments]  ( 1 min )
    Leaking private data from gradients [D]
    In "Communication-efficient learning of deep networks from decentralized data" (https://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf) McMahan et al write the following (in footnote 1 on page 2): "If the update is the total gradient of the loss onall of the local data, and the features are a sparse bag-of-words,then the non-zero gradients reveal exactly which words the userhas entered on the device.". For context, this paper assumes a set of clients have local private data and a global, joint model over all that data is to be learnt without collecting it to a central server. Clients compute gradients on local data in rounds, apply them and share updated weights with a central server. The footnote above refers to gradients, which are usually shared in distributed optimization, as opposed to parameters. The central server usually averages the gradients before a gradient descent step in that setting. I tried figuring out how the gradient of the loss with respect to one private, local client dataset is supposed to leak the text that user has typed (given bag-of-words features used in model), but haven't quite managed. There is neither proof nor citation in the paper, so it must be something obvious. submitted by /u/lemlo100 [link] [comments]  ( 2 min )
    [D] Open course or study material on signal processing for machine learning
    Hello Me and my colleagues are looking for courses or study material on speech processing for machine learning (signal processing, text-to-speech, etc...). It seems to us that it is harder to find compared to NLP, Computer Vision courses in youtube and coursera. Do any of you guys know some good lectures, slides, or study material related to signals? It can be anything from basics of signal processing to applications in machine learning. Any help will be appreciated! Thank you. submitted by /u/HotRecognition0121 [link] [comments]  ( 1 min )
    [D] Do you use NLTK or Spacy for text preprocessing?
    Nowadays people just pass their text through the default vectorizers pretrained along with BERT, RoBERTa from HuggingFace. But, if you trained your own model would you use spacy or nltk? NLTK seems kinda outdated.. Also, do HF tokenizers (pretrained) require some cleaning of the text beforehand? (eliminating numbers, hashtags etc.) i.e. is there any need for the libraries from the title in this case? submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 2 min )
    [D] Bachelor Thesis: Research question
    Hello dear ML-enthusiasts, herewith I turn to you as a newbie in the ML field, hoping to get inspiration for the formulation of my research question. I am interested in the great potential of ML (especially Neural Networks) and therefore I would like to write my thesis in this area to gain first insights in this revolutionary technology. In Germany, it is common as a bachelor thesis to conduct a literature analysis and to examine individual concepts, compare them and determine the state of the art. I have read about 20 papers and keep getting lost down a rabbit hole that is way beyond my expertise. I would be very grateful for any key words/concepts/ideas that are interesting that I could look up for further research. Thanks in advance :)) submitted by /u/Hundenberg [link] [comments]  ( 1 min )
  • Open

    Sequence length in LSTM
    In this PPO-LSTM architecture, at some point there is a sequence length variable (https://github.com/MarcoMeter/recurrent-ppo-truncated-bptt/blob/9206a97b7546ec62e668eaf67ae6d4b752e0f0ee/model.py#L79). If you look at the configs, it is always set to 8 in each environment (https://github.com/MarcoMeter/recurrent-ppo-truncated-bptt/blob/9206a97b7546ec62e668eaf67ae6d4b752e0f0ee/configs.py#L15). It is described as "length of the fed sequences". Can you make an example so that it is clearer what this is referring to? Thanks! submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Preprocessing layers applied to an observation space that contains direction and position of the agent as well as a global observation
    I'm trying to figure out what this code is doing but I have a hard time understanding the observation space here. The observation space is passed as an input, then there are some preprocessing layers (built as a dictionary) in which: - The key "policy state" (which is the hidden state of the LSTM that is controlling the policy) is associated to a few linear layers - The key "image" is associated to conv nets - The key "direction" (which is assigned to scalar name) is associated to a fully connected layer - The key "position" is associated to a single fully connected layer as well. What I don't understand is: how do you get such an observation space, containing position and direction of the agent as well as an observation of the environment? ## Linear layer preprocessing_layers = { "policy_state": torch.nn.Linear.Lambda(lambda x: x) } ## A convolution layer processes the image preprocessing_layers["image"] = torch.nn.Sequential([ torch.nn.Conv2d(obs_space.shape[0], conv_filters, 1), torch.nn.ReLU()]) ## The scalar inputs are processed with a single fully connected layer of size 5 ## Direction that the agent is facing if scalar_name in obs_space: preprocessing_layers[scalar_name] = torch.nn.Sequential( [utils.one_hot_layer(scalar_dim), torch.nn.Linear(scalar_fc)]) ## Position of the agent if "position" in obs_space: preprocessing_layers["position"] = torch.nn.Sequential([torch.nn.Linear(scalar_fc)]) submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Question about Multi agent reinforcement learning
    Is it possible that two agent will become smarter when competing each other in an environment?, given that they know nothing about the environment at the beginning. submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    research on object manipulation
    Can someone please tell what's the progress in research on object manipulation using robotic arm... is it worth start doing research on this topic or a worthfull topic from futuristic point of view submitted by /u/Western-Age3148 [link] [comments]
    Q-Learning on OpenAI Gym Never Gets Reward
    I am very new to reinforcement Q-learning and I recently got through a tutorial based on https://gym.openai.com/envs/FrozenLake-v0/. The tutorial link is here: https://deeplizard.com/learn/video/HGeI30uATws. I went through the tutorial and I just could not get what it promised to work. Basically, after doing everything listed, I should be able to make my agent play Frozen Lake with about 70% accuracy. However, all I get is 0% accuracy. Naturally, before asking here, I would debug to see what is wrong, and I have some suspects. I see that for each episode, my agent caps out at the iteration I defined, thereby receiving no reward: Episode 9913 ended after 100 iterations. Rewards = 0.0 Episode 9914 ended after 100 iterations. Rewards = 0.0 Episode 9915 ended after 100 iterations. Rewards = 0…  ( 7 min )
    "Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion", Ji et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Gym Custom Environment action_space
    Hi everyone, I am new to RL and I am currently working on a custom environment with gym. I want my action to be either -1 or 1. How do I do set the action_space for that? submitted by /u/Select-Blackberry451 [link] [comments]  ( 1 min )
  • Open

    Process larger and wider datasets with Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Data Wrangler can simplify your data preparation and feature engineering processes and help you with data selection, cleaning, exploration, and visualization. Data Wrangler has over 300 built-in transforms written in PySpark, […]  ( 5 min )
    Fine-tune transformer language models for linguistic diversity with Hugging Face on Amazon SageMaker
    Approximately 7,000 languages are in use today. Despite attempts in the late 19th century to invent constructed languages such as Volapük or Esperanto, there is no sign of unification. People still choose to create new languages (think about your favorite movie character who speaks Klingon, Dothraki, or Elvish). Today, natural language processing (NLP) examples are […]  ( 13 min )
    Build a custom Q&A dataset using Amazon SageMaker Ground Truth to train a Hugging Face Q&A NLU model
    In recent years, natural language understanding (NLU) has increasingly found business value, fueled by model improvements as well as the scalability and cost-efficiency of cloud-based infrastructure. Specifically, the Transformer deep learning architecture, often implemented in the form of BERT models, has been highly successful, but training, fine-tuning, and optimizing these models has proven to be […]  ( 19 min )
  • Open

    ANN in Keras
    Hello everyone, I am trying to make a keras neural network model. I have made a custom gym environment and since I am new to machine learning, I am confused on how many layers to use and which to use. My observation space is: self.observation_space = spaces.Box(low=-1, high=1, shape=(4,6), dtype=np.float16). My action space is a discrete value of 2. I am trying to implement DQN and I get the following error: DQN expects a model that has one dimension for each action, in this case 2. I think the problem is with my observation space and action space dimensions, but I am unsure how to fix it. Any help is deeply appreaciated! submitted by /u/Select-Blackberry451 [link] [comments]  ( 1 min )
    Neural Network to approximate function
    Hello I am trying to create a Neural Network to predict af function (Rosenbrock function). I desire to use the toolbox provided by MATLAB but i don't know which function to use. Can someone help me? The Rosenbrock function looks like this: f(x,y) = (1 - x)2 + 100 (y - x2)2 Hope someone is willing to help me. submitted by /u/TobiasFred [link] [comments]  ( 1 min )
    NN from Scratch: #6 Building the model | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
  • Open

    Cloud Document Management Is the Future — Here’s Why
    Organizations today have been using some form of document management for years, whether on paper, computer, or online. While we at Bentech…  ( 2 min )
  • Open

    Escaping the Big Data Paradigm with Compact Transformers
    Highlights  ( 2 min )

  • Open

    Use custom vocabulary in Amazon Lex to enhance speech recognition
    In our daily conversations, we come across new words or terms that we may not know. Perhaps these are related to a new domain that we’re just getting familiar with, and we pick these up as we understand more about the domain. For example, home loan terminology (“curtailment”), shortened words, (“refi”, “comps”), and acronyms (“HELOC”) […]  ( 6 min )
    Predict customer churn with no-code machine learning using Amazon SageMaker Canvas
    Understanding customer behavior is top of mind for every business today. Gaining insights into why and how customers buy can help grow revenue. But losing customers (also called customer churn) is always a risk, and insights into why customers leave can be just as important for maintaining revenues and profits. Machine learning (ML) can help […]  ( 9 min )
  • Open

    Accuracy of prediction
    There is any technique for the NN to estimate the accuracy of his prediction? I do not mean just listing how close to 1 are the output cells in a classification, but actually giving a probability of the result being correct. I'm thinking on predicting time series, but I wish to get a more general answer. submitted by /u/RexurrectionOfDoom [link] [comments]
    does anyone have links to papers that use Neural-nets to predict the weather?
    submitted by /u/20io_anarchist [link] [comments]
    CNN applied on EEG data - Better to feed the network Raw data or Power Spectrum Density (PSD)
    I'm analyzing EEG data using a convolutional neural network. I could feed the network either the raw data (with some preprocessing), or I could estimate the PSD using for instance Welch's method, and instead feed this to the network. Both implementations exist in academic literature, but it's never well explained why one would choose one or the other. I get that using the PSD inherently applies some filtering, and that this might improve feature extraction by the CNN in some cases, but which cases? And why? A ready-made answer would be amazing, but I would already be very grateful for some links towards resources that cover this. submitted by /u/Qosarom [link] [comments]  ( 1 min )
    Nndl applications in defense
    Does anyone know how cnn is used in defense applications submitted by /u/Actual-Performer-832 [link] [comments]
    Architecture suggestion for multi-class classification task on images with captions
    I am currently starting on a multi-class classification task on images with a small description of the images. We are given training data on images, their captions, and labels and we need to construct an architecture to classify labels of a new image with captions. I have done some research online and was suggested that the best architecture for multi-class classification task for images would be a combination of CNN and RNN but I couldn't find anywhere on how to utilize the caption with the images together for training. Any advice on where I should start? submitted by /u/anzhuoxianshen [link] [comments]  ( 1 min )
  • Open

    How do you initialize a recurrent layer if you don't know the exact size of the input yet?
    Let's say I have a function that processes my observation in a way that makes it difficult to predict the size of the input for the subsequent recurrent layer. In the init method, how do I initialize such recurrent layer? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    PongNoFrameskip agent learns to do nothing
    Hello! Currently I'm working on a DQN agent for the PongNoFrameskip-v0/4 Atari game. The network itself is really simple, it's the same as in almost all of the tutorials (3 Conv+ReLU and 2 FC(512)). I followed several tutorials, and created my own solution for the problem, but for some reason, my agent only learns to do nothing, after 200 hundred episodes (~190k steps). The tutorial codes are working well (with the same hyperparameters and frame processing as my agent), their agent starts to learn something after 120k episodes, and I can't figure out, what am I doing wrong. Maybe it is a small thing, just I can't see it. Can anyone who is more advanced in the topic help me with it? I can provide the code, it can run on Colab. submitted by /u/Drotos7 [link] [comments]  ( 1 min )
    queries regarding doubts on research papers
    can someone tell the platform where i can ask the queries of a particular paper on deep reinforcement learning. submitted by /u/Western-Age3148 [link] [comments]
    What happens if you don't mask the hidden states of a recurrent policy?
    What happens if you don't reset the hidden states to zero when the environment is done during training? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    in MARL, does sharing a policy mean sharing the action space?
    I am a bit confused. In this repository (https://github.com/zoeyuchao/mappo), the authors claim that "by default all experiments assume a shared policy by all agents, i.e. there is one neural net shared by all agents. However, in the code I see that a) there is the option of sharing the observation space b) there is no option of sharing the action space. So what does that sentence mean exactly? Thanks! submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    input to the target networks in actor critic algorithms
    What would be the input to the target networks when we are using this for the object manipulation using robotic arm, and how this input will be different from the main actor critic networks submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    MADDPG: model performs not good enough
    I'm using the MADDPG algorithm, trying to control traffic signals in an urban network. I expected it would perform better than traditional methods (e.g., solved by Synchro) but failed. Also, I found that after training 1000 episodes, it just improved not much better than initialized network output without noise. I suspect my model did not learn anything, but I have no idea to solve this problem. And here are the figures for the average episodic reward and each agent's critic loss. The losses dropped rapidly in the first 200 episodes and then were smooth until the end. The average reward converged gradually with time going on, but maybe the reason is related to the noise gradually going down, not because the model successfully learned a better policy. Each agent's critic loss Episodic reward I've uploaded my code to my GitHub, and hope someone can take a look and give me some directions to modify my code properly. submitted by /u/ntuce002 [link] [comments]  ( 1 min )
    Advice regarding Simulation Environment for Master Thesis
    Hi all, I started my Master Thesis recently in which I have to implement different RL Agents to basically control a multi axis machine to fit puzzle pieces into their designated positions & evaluate the different performances over the next 22 weeks. I was given a simulation environment on Unity which is by far insufficient to reflect the complexity and physics of the task, hence I will have to either rebuild a simulation env. from scratch or adjust the given one. In the recent weeks I familiarized myself a bit with Unity & Mujoco and have to make the decision now to Work & adjust the existing Unity env. Build a new one on MuJoCo (or maybe pybullet) Work & adjust the existing Unity env + use the MuJoCo unity plug-in to use the physics engine To be said, I am familiar with the deeper theory & math of the different RL algorithms (Q-Learning, Policy Gradients, SAC), but I do not have any practical implementation experience of them yet. My concerns regarding building the environment completely new are to loose to much time on that, as my thesis is about the RL approaches and not the sim. env., but on the other hand it might pay off investing that time to have a solid env. to train the agents on. ​ Do you guys have any advice or have experience in a similar situation? ​ Cheers submitted by /u/disdisinform [link] [comments]  ( 2 min )
    Will CNN be useful for training an agent to play connect 4? I am thinking about implement CNN without pooling cause I dont want to lose any information for the agent to learn? is 4x4 kernel a good start? (new to CNN)
    submitted by /u/Professional_Card176 [link] [comments]  ( 2 min )
    What does the big E (expected value..?) in bellman's equation really mean?
    submitted by /u/Spencerbug [link] [comments]  ( 1 min )
  • Open

    [P] Can anyone suggest free Image annotation tool for multi labelling?
    I am annotating handwritten texts. by multi label i mean i am storing multiple information with a single bounded component. lets say there i am classifying a text as "clean text" and "unclean text" Then with clean text the content of the text, language and if its an math expression or not is getting stored. I have been using plainsight. and it's sublabel feature was doing fine but i just noticed that it is paid after a month of free use. I am curious if LabelMe does this as in the tutorial it is not explicitly said that a single component can have multiple information or classes associated with it. submitted by /u/slowturtle56 [link] [comments]  ( 1 min )
    [D] What venues accept high quality surveys?
    I am specifically asking about high-profile applications within the Computer Vision or Deep Learning space. I am just curious what sort of venue would be appropriate for a survey? submitted by /u/AbjectDrink3276 [link] [comments]  ( 1 min )
    [D] This Ape Does Not Exist! I trained a StyleGAN2 on the Bored Ape Yacht Club NFT Collection (YouTube Video)
    https://youtu.be/Pm93D8CVlY8 Today we build our own AI that can create as many bored apes as we want! Fungibility for everyone! OUTLINE: 0:00 - Introduction 2:05 - Generative Adversarial Networks 3:40 - Scraping Opensea with BrightData 7:55 - Training the GAN 11:35 - Here are the results! 15:20 - Diving deeper into BrightData ​ Try the model here: https://huggingface.co/spaces/ykilcher/apes or here: https://ykilcher.com/apes Files & Models here: https://huggingface.co/ykilcher/apes/tree/main Code here: https://github.com/yk/apes-public submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [P] Start-up puts an end to hassle around vehicle damage inspections
    Lensor uses cameras and machine learning to inspect a car in 7 seconds, making over 200 photos. These are checked by a system that was trained on a huge collection of photos of damage. In the future this will replace walking around the car at the rental station, filling in forms. submitted by /u/DutchTechJunkie [link] [comments]  ( 1 min )
    [R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet
    Paper: https://arxiv.org/abs/2205.01917 ​ Impressive performance on diverse datasets may indicate higher generalizability :) [Without \"task-specific\" customizations] Confirms that multi-modal models can scale further from single-digit Billion params (who would've thought) and scales up an simple CLIP-like model showing substantial improvements - especially in 0-shot domain. Simple Contrastive learning appears more and more promising for multi-modal objectives... Overall, its nothing novel - simply scaled up research. submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 2 min )
    [R] ExSum: From Local Explanations to Model Understanding
    Excited to share our latest research on model interpretability, to appear at NAACL this summer. In this paper, we reflect on local model explanations (e.g. LIME, SHAP, gradient saliency), and think about how people actually use them to derive high-level model understanding (e.g. is the model relying on spurious correlation, is it biased, can I trust it). Obviously, they need to be correct (or faithful), which has been the focus of many interpretability evaluations. However, we argue that they also need to be understandable, and propose explanation summary (ExSum), the first mathematical framework to quantify this understandability aspect. Using it, we demonstrate that a formal practice of quantifying model understanding leads to better and more awareness of subtle model behaviors that would be easily missed if we were just to inspect a few local explanations in an ad hoc way. Happy to answer any questions. Paper: https://arxiv.org/pdf/2205.00130.pdf Project website: https://yilunzhou.github.io/exsum/ MIT News release: https://news.mit.edu/2022/machine-learning-explainability-0505 submitted by /u/zyl1024 [link] [comments]  ( 2 min )
    [D] what is a good model choice for detection of small numbers in a picture (followed by classification)
    I want to detect jersey numbers in the back of soccer players. My current model uses stn but i am afraid it is not the best approach. I know that there are plenty of different object detection models but i dont know enough to make a choice. My data has the bounding box coordinates and the correct number, so i could give double supervision submitted by /u/TheManveru [link] [comments]  ( 1 min )
    [D] Why is NLP given so much attention and resources?
    I don't really see any models, besides DM's alpha go, mu zero, etc, gain the attention or resources that NLP projects like GPT-3 and others get, so I am curious as to why that is and why aren't there many other ig model "types" with as many parameters besides NLP ones. Is it due to more clear applications and more probable ROI? Is there a view that NLP models like these have a higher chance at leading to more general intelligence, such as the use of these models in code generation and other problem solving tasks? submitted by /u/Southern-Trip-1102 [link] [comments]  ( 4 min )
    [D] Scaling in Multivariate/Multi-output regression
    Hello, I have been trying to improve the accuracy of my model. Model: Input is a set of measurements (one set in mm and another set in degrees) Output is a set of predictions (again one set in mm and another set in degrees) Now as you can imagine, there are times when the data has completely different scales. I wanted to know what the effect of scaling the input data has on the outcome. I'm using sklearn's MultiOutputRegressor on top of lgbm.LGBMRegressor Also, is there a better way to score my model? The default R² score is fine, but when I try to pass a custom callback to model.fit() with the keyword eval_metric, it isn't being used. submitted by /u/translunarinjection [link] [comments]  ( 1 min )
    [D] Help: Finding a paper about self-supervised pretraining being limited on some datasets and not yielding good results on some
    Any literature that explains that self-supervised pretraining sometimes doesn't work on other datasets? I have used SWaV and SimCLR but they are not performing better and sometimes worse. submitted by /u/sarmientoj24 [link] [comments]  ( 1 min )
    GAN Literature Review Recommendations? [D]
    Hi, I'm looking to do a lit review of GANs before starting a project - only problem is I've lost touch with developments in the last 2-3 years. Does anyone have any papers or key advancements they could recommend to me within the last 3 years or so? Would appreciate any input. Thanks submitted by /u/blahbloopooo [link] [comments]  ( 1 min )
    [D] What is in your perfect dev environment?
    Regardless of where you're running, what are the must have tools in your ML arsenal (and perhaps why you need it so much)? This is the stuff that you would want pre-installed on whatever environment you'd want to use, whether that's a container, VM, or bare metal server. The reason I ask is because I'm looking for ways to make the default run image on RunPod a bit more friendly. Or maybe it makes sense to have different environments for different major use cases? Is it a fools errand to try to cram everything into one, or even a few, buckets? Let me know in the comments! TIA :) submitted by /u/runpod-io [link] [comments]  ( 2 min )
  • Open

    OpenAI Leadership Team Update
    We’re happy to announce several executive role changes that reflect our recent progress and will ensure continued momentum toward our next major milestones.  ( 1 min )
  • Open

    Last Week in AI: AI Kills Cookie Pop-Ups, Models Volcanoes, Screens for Child Neglect, Paints Harry Potter
    submitted by /u/regalalgorithm [link] [comments]
    Iterative Introduces New ML Tool TPI Plugin For HashiCorp's Terraform Cloud Infrastructure Service
    submitted by /u/thumbsdrivesmecrazy [link] [comments]
    Exciting Data Science Project Ideas To Brush Up Your Skills
    Understanding Data Science can be quite confusing at first, but with constant practice, you can soon begin to grasp the various notions and terminologies in the subject. The best way to gain more exposure to Data Science apart from going through the literature is to take on some helpful projects which will not only upskill you but will also make your resume more impressive. Here are some new-fangled data science project ideas that zealous data science professionals can pick from: https://betterprogramming.pub/exciting-data-science-project-ideas-to-brush-up-your-skills-54475993d413 submitted by /u/saik2363 [link] [comments]  ( 1 min )
    AI in cybersecurity - for good and bad
    Would love to hear any thoughts on AI and ML in cybersecurity, whether helping attackers or defenders - anyone work in CS? I just published a podcast about artificial intelligence and its growing influence on cybersecurity, both as an adversarial tool (deepfakes, social engineering etc.) and as a way to detect anomalies and protect websites etc. The guest is Elaine Lee, a data scientist working at Mimecast to use AI to stop spam and phishing attacks. https://open.spotify.com/episode/2BG4c0qyXJsjvOE5nqifhe?si=a318ca529e164174 submitted by /u/AgentLessBots [link] [comments]  ( 1 min )
    Any free chat bots that have URL endpoints to get their responses?
    submitted by /u/antoniscool28 [link] [comments]  ( 1 min )
    Have any researchers in the field discussed anything about the prospect of 'text-to-video' - something that's a bit like DALL-E 2, but with a video as the finished output?
    I'm wondering if any of the researchers who are in this field have spoken about the idea of (eventually) creating something where you would end up with some kind of short video at the end of it. submitted by /u/brick_eater [link] [comments]  ( 1 min )
    Support for macOS Added in MindSpore 1.6
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 5 min )
  • Open

    Learning Locomotion Skills Safely in the Real World
    Posted by Jimmy (Tsung-Yen) Yang, Student Researcher, Robotics at Google The promise of deep reinforcement learning (RL) in solving complex, high-dimensional problems autonomously has attracted much interest in areas such as robotics, game playing, and self-driving cars. However, effectively training an RL policy requires exploring a large set of robot states and actions, including many that are not safe for the robot. This is a considerable risk, for example, when training a legged robot. Because such robots are inherently unstable, there is a high likelihood of the robot falling during learning, which could cause damage. The risk of damage can be mitigated to some extent by learning the control policy in computer simulation and then deploying it in the real world. However, this approac…  ( 9 min )
  • Open

    Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive
    Teaching the AI brains of autonomous vehicles to understand the world as humans do requires billions of miles of driving experience. The road to achieving this astronomical level of driving leads to the virtual world. On the latest episode of the AI Podcast, Waabi CEO and founder Raquel Urtasun joins NVIDIA’s Katie Burke Washabaugh to Read article > The post Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive appeared first on NVIDIA Blog.  ( 2 min )
    GFN Thursday Caught in 4K: 27 Games Arriving on GeForce NOW in May, Alongside 4K Streaming to PC and Mac Apps
    Enjoy the finer things in life. May is looking pixel perfect for GeForce NOW gamers. RTX 3080 members can now take their games to the next level, streaming at 4K resolution on the GeForce NOW PC and Mac native apps — joining 4K support in the living room with SHIELD TV. There’s also a list Read article > The post GFN Thursday Caught in 4K: 27 Games Arriving on GeForce NOW in May, Alongside 4K Streaming to PC and Mac Apps appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Setting Breakpoints and Exception Hooks in Python
    There are different ways of debugging code in Python, one of which is to introduce breakpoints into the code at […] The post Setting Breakpoints and Exception Hooks in Python appeared first on Machine Learning Mastery.  ( 14 min )
  • Open

    Azure Quantum innovation: Efficient error correction of topological qubits with Floquet codes
    Technological innovation that enables scaling of quantum computing underpins the Microsoft Azure Quantum program. In March of this year, we announced our demonstration of the underlying physics required to create a topological qubit—qubits that are theorized to be inherently more stable than existing ones without sacrificing size or speed. However, our quest to deliver a […] The post Azure Quantum innovation: Efficient error correction of topological qubits with Floquet codes appeared first on Microsoft Research.  ( 7 min )
  • Open

    Implementing an AI project the right way: Here’s how it works
    Do you want to reduce costs and introduce more efficient workflows in your company? Then you may have thought about using artificial…  ( 5 min )
  • Open

    DSC Weekly Newsletter 03 May 2022: How Many Meetings Do We Need?
    One of the more frustrating side effects of the Long Pandemic has been the rise in virtual meetings. I recently was talking with a colleague when the subject of meetings came up. “I swear that I spend much of most days in meetings,” she bemoaned. “It wouldn’t be so bad, but so many of them… Read More »DSC Weekly Newsletter 03 May 2022: How Many Meetings Do We Need? The post DSC Weekly Newsletter 03 May 2022: How Many Meetings Do We Need? appeared first on Data Science Central.  ( 7 min )
  • Open

    Unpacking black-box models
    Researchers create a mathematical framework to evaluate explanations of machine-learning models and quantify how well people understand them.  ( 6 min )
  • Open

    John Conway and mental exercise rituals
    John Horton Conway (1937–2020) came up with an algorithm in 1973 for mentally calculating what day of the week a date falls on. His method, which he called the “Doomsday rule” starts from the observation that every year, the dates 4/4. 6/6, 8/8, 10/10, 12/12, 5/9, 9/5, 7/11, and 11/7 fall on the same day […] John Conway and mental exercise rituals first appeared on John D. Cook.  ( 3 min )
  • Open

    The Grammar of Interactive Explanatory Model Analysis. (arXiv:2005.00497v4 [cs.LG] UPDATED)
    The growing need for in-depth analysis of predictive models leads to a series of new methods for explaining their local and global properties. Which of these methods is the best? It turns out that this is an ill-posed question. One cannot sufficiently explain a black-box machine learning model using a single method that gives only one perspective. Isolated explanations are prone to misunderstanding, leading to wrong or simplistic reasoning. This problem is known as the Rashomon effect and refers to diverse, even contradictory, interpretations of the same phenomenon. Surprisingly, most methods developed for explainable and responsible machine learning focus on a single-aspect of the model behavior. In contrast, we showcase the problem of explainability as an interactive and sequential analysis of a model. This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe potential human-model dialogues. It is implemented in a widely used human-centered open-source software framework that adopts interactivity, customizability and automation as its main traits. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model increases the performance and confidence of human decision making.
    Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification. (arXiv:2205.02151v1 [cs.CV])
    Recently, self-attention mechanisms have shown impressive performance in various NLP and CV tasks, which can help capture sequential characteristics and derive global information. In this work, we explore how to extend self-attention modules to better learn subtle feature embeddings for recognizing fine-grained objects, e.g., different bird species or person identities. To this end, we propose a dual cross-attention learning (DCAL) algorithm to coordinate with self-attention learning. First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions, which can help reinforce the spatial-wise discriminative clues for recognition. Second, we propose pair-wise cross-attention (PWCA) to establish the interactions between image pairs. PWCA can regularize the attention learning of an image by treating another image as distractor and will be removed during inference. We observe that DCAL can reduce misleading attentions and diffuse the attention response to discover more complementary parts for recognition. We conduct extensive evaluations on fine-grained visual categorization and object re-identification. Experiments demonstrate that DCAL performs on par with state-of-the-art methods and consistently improves multiple self-attention baselines, e.g., surpassing DeiT-Tiny and ViT-Base by 2.8% and 2.4% mAP on MSMT17, respectively.
    Approximation of Images via Generalized Higher Order Singular Value Decomposition over Finite-dimensional Commutative Semisimple Algebra. (arXiv:2202.00450v4 [cs.LG] UPDATED)
    Low-rank approximation of images via singular value decomposition is well-received in the era of big data. However, singular value decomposition (SVD) is only for order-two data, i.e., matrices. It is necessary to flatten a higher order input into a matrix or break it into a series of order-two slices to tackle higher order data such as multispectral images and videos with the SVD. Higher order singular value decomposition (HOSVD) extends the SVD and can approximate higher order data using sums of a few rank-one components. We consider the problem of generalizing HOSVD over a finite dimensional commutative algebra. This algebra, referred to as a t-algebra, generalizes the field of complex numbers. The elements of the algebra, called t-scalars, are fix-sized arrays of complex numbers. One can generalize matrices and tensors over t-scalars and then extend many canonical matrix and tensor algorithms, including HOSVD, to obtain higher-performance versions. The generalization of HOSVD is called THOSVD. Its performance of approximating multi-way data can be further improved by an alternating algorithm. THOSVD also unifies a wide range of principal component analysis algorithms. To exploit the potential of generalized algorithms using t-scalars for approximating images, we use a pixel neighborhood strategy to convert each pixel to "deeper-order" t-scalar. Experiments on publicly available images show that the generalized algorithm over t-scalars, namely THOSVD, compares favorably with its canonical counterparts.
    Axonal Delay As a Short-Term Memory for Feed Forward Deep Spiking Neural Networks. (arXiv:2205.02115v1 [cs.NE])
    The information of spiking neural networks (SNNs) are propagated between the adjacent biological neuron by spikes, which provides a computing paradigm with the promise of simulating the human brain. Recent studies have found that the time delay of neurons plays an important role in the learning process. Therefore, configuring the precise timing of the spike is a promising direction for understanding and improving the transmission process of temporal information in SNNs. However, most of the existing learning methods for spiking neurons are focusing on the adjustment of synaptic weight, while very few research has been working on axonal delay. In this paper, we verify the effectiveness of integrating time delay into supervised learning and propose a module that modulates the axonal delay through short-term memory. To this end, a rectified axonal delay (RAD) module is integrated with the spiking model to align the spike timing and thus improve the characterization learning ability of temporal features. Experiments on three neuromorphic benchmark datasets : NMNIST, DVS Gesture and N-TIDIGITS18 show that the proposed method achieves the state-of-the-art performance while using the fewest parameters.
    Semi-Supervised Cascaded Clustering for Classification of Noisy Label Data. (arXiv:2205.02209v1 [cs.LG])
    The performance of supervised classification techniques often deteriorates when the data has noisy labels. Even the semi-supervised classification approaches have largely focused only on the problem of handling missing labels. Most of the approaches addressing the noisy label data rely on deep neural networks (DNN) that require huge datasets for classification tasks. This poses a serious challenge especially in process and manufacturing industries, where the data is limited and labels are noisy. We propose a semi-supervised cascaded clustering (SSCC) algorithm to extract patterns and generate a cascaded tree of classes in such datasets. A novel cluster evaluation matrix (CEM) with configurable hyperparameters is introduced to localize and eliminate the noisy labels and invoke a pruning criterion on cascaded clustering. The algorithm reduces the dependency on expensive human expertise for assessing the accuracy of labels. A classifier generated based on SSCC is found to be accurate and consistent even when trained on noisy label datasets. It performed better in comparison with the support vector machines (SVM) when tested on multiple noisy-label datasets, including an industrial dataset. The proposed approach can be effectively used for deriving actionable insights in industrial settings with minimal human expertise.
    Learning the temporal evolution of multivariate densities via normalizing flows. (arXiv:2107.13735v2 [stat.ML] UPDATED)
    In this work, we propose a method to learn multivariate probability distributions using sample path data from stochastic differential equations. Specifically, we consider temporally evolving probability distributions (e.g., those produced by integrating local or nonlocal Fokker-Planck equations). We analyze this evolution through machine learning assisted construction of a time-dependent mapping that takes a reference distribution (say, a Gaussian) to each and every instance of our evolving distribution. If the reference distribution is the initial condition of a Fokker-Planck equation, what we learn is the time-T map of the corresponding solution. Specifically, the learned map is a multivariate normalizing flow that deforms the support of the reference density to the support of each and every density snapshot in time. We demonstrate that this approach can approximate probability density function evolutions in time from observed sampled data for systems driven by both Brownian and L\'evy noise. We present examples with two- and three-dimensional, uni- and multimodal distributions to validate the method.
    Deep Reinforcement Learning-Based Long-Range Autonomous Valet Parking for Smart Cities. (arXiv:2109.11661v3 [cs.LG] UPDATED)
    In this paper, to reduce the congestion rate at the city center and increase the quality of experience (QoE) of each user, the framework of long-range autonomous valet parking (LAVP) is presented, where an Autonomous Vehicle (AV) is deployed in the city, which can pick up, drop off users at their required spots, and then drive to the car park out of city center autonomously. In this framework, we aim to minimize the overall distance of the AV, while guarantee all users are served, i.e., picking up, and dropping off users at their required spots through optimizing the path planning of the AV and number of serving time slots. To this end, we first propose a learning based algorithm, which is named as Double-Layer Ant Colony Optimization (DL-ACO) algorithm to solve the above problem in an iterative way. Then, to make the real-time decision, while consider the dynamic environment (i.e., the AV may pick up and drop off users from different locations), we further present a deep reinforcement learning (DRL) based algorithm, which is known as deep Q network (DQN). The experimental results show that the DL-ACO and DQN-based algorithms both achieve the considerable performance.
    Microgrid Day-Ahead Scheduling Considering Neural Network based Battery Degradation Model. (arXiv:2202.12416v2 [eess.SP] UPDATED)
    Battery energy storage system (BESS) can effectively mitigate the uncertainty of variable renewable generation. Degradation is un-preventable for batteries such as the most popular Lithium-ion battery (LiB). The main causes of LiB degradation are loss of Li-ions, loss of electrolyte, and increase of internal resistance which are hard to model and predict. In this paper, we propose a data driven method to predict the battery degradation per a given scheduled battery operational profile. Particularly, a neural net-work based battery degradation (NNBD) model is proposed to quantify the battery degradation with inputs of major battery degradation factors. When incorporating the proposed NNBD model into microgrid day-ahead scheduling (MDS), we can estab-lish a battery degradation based MDS (BDMDS) model that can consider the equivalent battery degradation cost precisely. Since the proposed NNBD model is highly non-linear and non-convex, BDMDS would be very hard to solve. To address this issue, a neural network and optimization decoupled heuristic (NNODH) algorithm is proposed in this paper to effectively solve this neural network embedded optimization problem. Simulation results demonstrate that the proposed NNODH algorithm is able to ob-tain the optimal solution with lowest total cost including normal operation cost and battery degradation cost.
    Sequencer: Deep LSTM for Image Classification. (arXiv:2205.01972v1 [cs.CV])
    In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6\% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band.
    Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain. (arXiv:2205.02169v1 [q-bio.BM])
    Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening, to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG is incapable of recognizing when a prediction model produces worse than random results. Nevertheless, NDCG is still used in the performance evaluation of compound screening using learning-to-rank. This study used the GBDT model with ranking loss functions, called lambdarank and lambdaloss, for ligand-based virtual screening; results were compared with existing RankSVM methods and GBDT models using regression. We also proposed a new ranking metric, Normalized Enrichment Discounted Cumulative Gain (NEDCG), which aims to properly evaluate the goodness of ranking predictions. Results showed that the GBDT model with learning-to-rank outperformed existing regression methods using GBDT and RankSVM on diverse datasets. Moreover, NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets, demonstrating its usefulness for a more direct assessment of compound screening performance.
    Learning Mechanically Driven Emergent Behavior with Message Passing Neural Networks. (arXiv:2202.01380v2 [cs.LG] UPDATED)
    From designing architected materials to connecting mechanical behavior across scales, computational modeling is a critical tool in solid mechanics. Recently, there has been a growing interest in using machine learning to reduce the computational cost of physics-based simulations. Notably, while machine learning approaches that rely on Graph Neural Networks (GNNs) have shown success in learning mechanics, the performance of GNNs has yet to be investigated on a myriad of solid mechanics problems. In this work, we examine the ability of GNNs to predict a fundamental aspect of mechanically driven emergent behavior: the connection between a column's geometric structure and the direction that it buckles. To accomplish this, we introduce the Asymmetric Buckling Columns (ABC) dataset, a dataset comprised of three sub-datasets of asymmetric and heterogeneous column geometries where the goal is to classify the direction of symmetry breaking (left or right) under compression after the onset of instability. Because of complex local geometry, the "image-like" data representations required for implementing standard convolutional neural network based metamodels are not ideal, thus motivating the use of GNNs. In addition to investigating GNN model architecture, we study the effect of different input data representation approaches, data augmentation, and combining multiple models as an ensemble. While we were able to obtain good results, we also showed that predicting solid mechanics based emergent behavior is non-trivial. Because both our model implementation and dataset are distributed under open-source licenses, we hope that future researchers can build on our work to create enhanced mechanics-specific machine learning pipelines for capturing the behavior of complex geometric structures.
    Analysis of Temporal Difference Learning: Linear System Approach. (arXiv:2204.10479v3 [cs.LG] UPDATED)
    The goal of this technical note is to introduce a new finite-time convergence analysis of temporal difference (TD) learning based on stochastic linear system models. TD-learning is a fundamental reinforcement learning (RL) to evaluate a given policy by estimating the corresponding value function for a Markov decision process. While there has been a series of successful works in theoretical analysis of TDlearning, it was not until recently that researchers found some guarantees on its statistical efficiency by developing finite-time error bounds. In this paper, we propose a simple control theoretic finite-time analysis of TD-learning, which exploits linear system models and standard notions in linear system communities. The proposed work provides new simple templets for RL analysis, and additional insights on TD-learning and RL based on ideas in control theory.
    Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. (arXiv:2201.05989v2 [cs.CV] UPDATED)
    Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of ${1920\!\times\!1080}$.
    Modeling Task Interactions in Document-Level Joint Entity and Relation Extraction. (arXiv:2205.01909v1 [cs.CL])
    We target on the document-level relation extraction in an end-to-end setting, where the model needs to jointly perform mention extraction, coreference resolution (COREF) and relation extraction (RE) at once, and gets evaluated in an entity-centric way. Especially, we address the two-way interaction between COREF and RE that has not been the focus by previous work, and propose to introduce explicit interaction namely Graph Compatibility (GC) that is specifically designed to leverage task characteristics, bridging decisions of two tasks for direct task interference. Our experiments are conducted on DocRED and DWIE; in addition to GC, we implement and compare different multi-task settings commonly adopted in previous work, including pipeline, shared encoders, graph propagation, to examine the effectiveness of different interactions. The result shows that GC achieves the best performance by up to 2.3/5.1 F1 improvement over the baseline.
    Assessing Dataset Bias in Computer Vision. (arXiv:2205.01811v1 [cs.CV])
    A biased dataset is a dataset that generally has attributes with an uneven class distribution. These biases have the tendency to propagate to the models that train on them, often leading to a poor performance in the minority class. In this project, we will explore the extent to which various data augmentation methods alleviate intrinsic biases within the dataset. We will apply several augmentation techniques on a sample of the UTKFace dataset, such as undersampling, geometric transformations, variational autoencoders (VAEs), and generative adversarial networks (GANs). We then trained a classifier for each of the augmented datasets and evaluated their performance on the native test set and on external facial recognition datasets. We have also compared their performance to the state-of-the-art attribute classifier trained on the FairFace dataset. Through experimentation, we were able to find that training the model on StarGAN-generated images led to the best overall performance. We also found that training on geometrically transformed images lead to a similar performance with a much quicker training time. Additionally, the best performing models also exhibit a uniform performance across the classes within each attribute. This signifies that the model was also able to mitigate the biases present in the baseline model that was trained on the original training set. Finally, we were able to show that our model has a better overall performance and consistency on age and ethnicity classification on multiple datasets when compared with the FairFace model. Our final model has an accuracy on the UTKFace test set of 91.75%, 91.30%, and 87.20% for the gender, age, and ethnicity attribute respectively, with a standard deviation of less than 0.1 between the accuracies of the classes of each attribute.
    Signal Decomposition Using Masked Proximal Operators. (arXiv:2202.09338v4 [cs.LG] UPDATED)
    We consider the well-studied problem of decomposing a vector time series signal into components with different characteristics, such as smooth, periodic, nonnegative, or sparse. We propose a simple and general framework in which the components are defined by loss functions (which include constraints), and the signal decomposition is carried out by minimizing the sum of losses of the components (subject to the constraints). When each loss function is the negative log-likelihood of a density for the signal component, our method coincides with maximum a posteriori probability (MAP) estimation; but it also includes many other interesting cases. We give two distributed optimization methods for computing the decomposition, which find the optimal decomposition when the component class loss functions are convex, and are good heuristics when they are not. Both methods require only the masked proximal operator of each of the component loss functions, a generalization of the well-known proximal operator that handles missing entries in its argument. Both methods are distributed, i.e., handle each component separately. We derive tractable methods for evaluating the masked proximal operators of some loss functions that, to our knowledge, have not appeared in the literature.  ( 2 min )
    TracInAD: Measuring Influence for Anomaly Detection. (arXiv:2205.01362v2 [cs.LG] UPDATED)
    As with many other tasks, neural networks prove very effective for anomaly detection purposes. However, very few deep-learning models are suited for detecting anomalies on tabular datasets. This paper proposes a novel methodology to flag anomalies based on TracIn, an influence measure initially introduced for explicability purposes. The proposed methods can serve to augment any unsupervised deep anomaly detection method. We test our approach using Variational Autoencoders and show that the average influence of a subsample of training points on a test point can serve as a proxy for abnormality. Our model proves to be competitive in comparison with state-of-the-art approaches: it achieves comparable or better performance in terms of detection accuracy on medical and cyber-security tabular benchmark data.
    Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting. (arXiv:2006.07507v3 [cs.LG] UPDATED)
    Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. The new update is derived through the solution of an Ordinary Differential Equation (ODE) and solved in a closed form. We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.  ( 2 min )
    ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints. (arXiv:2202.11271v2 [cs.RO] UPDATED)
    Robotic navigation has been approached as a problem of 3D reconstruction and planning, as well as an end-to-end learning problem. However, long-range navigation requires both planning and reasoning about local traversability, as well as being able to utilize general knowledge about global geography, in the form of a roadmap, GPS, or other side information providing important cues. In this work, we propose an approach that integrates learning and planning, and can utilize side information such as schematic roadmaps, satellite maps and GPS coordinates as a planning heuristic, without relying on them being accurate. Our method, ViKiNG, incorporates a local traversability model, which looks at the robot's current camera observation and a potential subgoal to infer how easily that subgoal can be reached, as well as a heuristic model, which looks at overhead maps for hints and attempts to evaluate the appropriateness of these subgoals in order to reach the goal. These models are used by a heuristic planner to identify the best waypoint in order to reach the final destination. Our method performs no explicit geometric reconstruction, utilizing only a topological representation of the environment. Despite having never seen trajectories longer than 80 meters in its training dataset, ViKiNG can leverage its image-based learned controller and goal-directed heuristic to navigate to goals up to 3 kilometers away in previously unseen environments, and exhibit complex behaviors such as probing potential paths and backtracking when they are found to be non-viable. ViKiNG is also robust to unreliable maps and GPS, since the low-level controller ultimately makes decisions based on egocentric image observations, using maps only as planning heuristics. For videos of our experiments, please check out our project page https://sites.google.com/view/viking-release.  ( 3 min )
    B\'ezier Curve Gaussian Processes. (arXiv:2205.01754v1 [stat.ML])
    Probabilistic models for sequential data are the basis for a variety of applications concerned with processing timely ordered information. The predominant approach in this domain is given by neural networks, which incorporate either stochastic units or components. This paper proposes a new probabilistic sequence model building on probabilistic B\'ezier curves. Using Gaussian distributed control points, these parametric curves pose a special case for Gaussian processes (GP). Combined with a Mixture Density network, Bayesian conditional inference can be performed without the need for mean field variational approximation or Monte Carlo simulation, which is a requirement of common approaches. For assessing this hybrid model's viability, it is applied to an exemplary sequence prediction task. In this case the model is used for pedestrian trajectory prediction, where a generated prediction also serves as a GP prior. Following this, the initial prediction can be refined using the GP framework by calculating different posterior distributions, in order to adapt more towards a given observed trajectory segment.
    i-Code: An Integrative and Composable Multimodal Learning Framework. (arXiv:2205.01818v1 [cs.LG])
    Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.
    Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition. (arXiv:2205.01982v1 [cs.RO])
    Service robots are integrating more and more into our daily lives to help us with various tasks. In such environments, robots frequently face new objects while working in the environment and need to learn them in an open-ended fashion. Furthermore, such robots must be able to recognize a wide range of object categories. In this paper, we present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem. In particular, we form ensemble methods based on deep representations and handcrafted 3D shape descriptors. To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly. The proposed model is suitable for open-ended learning scenarios where the number of 3D object categories is not fixed and can grow over time. We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios. For the evaluation purpose, in addition to real object datasets, we generate a large synthetic household objects dataset consisting of 27000 views of 90 objects. Experimental results demonstrate the effectiveness of the proposed method on 3D object recognition tasks, as well as its superior performance over the state-of-the-art approaches. Additionally, we demonstrated the effectiveness of our approach in both simulated and real-robot settings, where the robot rapidly learned new categories from limited examples.
    Splicing Detection and Localization In Satellite Imagery Using Conditional GANs. (arXiv:2205.01805v1 [cs.CV])
    The widespread availability of image editing tools and improvements in image processing techniques allow image manipulation to be very easy. Oftentimes, easy-to-use yet sophisticated image manipulation tools yields distortions/changes imperceptible to the human observer. Distribution of forged images can have drastic ramifications, especially when coupled with the speed and vastness of the Internet. Therefore, verifying image integrity poses an immense and important challenge to the digital forensic community. Satellite images specifically can be modified in a number of ways, including the insertion of objects to hide existing scenes and structures. In this paper, we describe the use of a Conditional Generative Adversarial Network (cGAN) to identify the presence of such spliced forgeries within satellite images. Additionally, we identify their locations and shapes. Trained on pristine and falsified images, our method achieves high success on these detection and localization objectives.
    Adversarial Training for High-Stakes Reliability. (arXiv:2205.01663v2 [cs.LG] UPDATED)
    In the future, powerful AI systems may be deployed in high-stakes settings, where a single failure could be catastrophic. One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance. In this work, we used a language generation task as a testbed for achieving high reliability through adversarial training. We created a series of adversarial training techniques -- including a tool that assists human adversaries -- to find and eliminate failures in a classifier that filters text completions suggested by a generator. In our simple "avoid injuries" task, we determined that we can set very conservative classifier thresholds without significantly impacting the quality of the filtered outputs. With our chosen thresholds, filtering with our baseline classifier decreases the rate of unsafe completions from about 2.4% to 0.003% on in-distribution data, which is near the limit of our ability to measure. We found that adversarial training significantly increased robustness to the adversarial attacks that we trained on, without affecting in-distribution performance. We hope to see further work in the high-stakes reliability setting, including more powerful tools for enhancing human adversaries and better ways to measure high levels of reliability, until we can confidently rule out the possibility of catastrophic deployment-time failures of powerful models.
    Generalized Reference Kernel for One-class Classification. (arXiv:2205.00534v2 [cs.LG] UPDATED)
    In this paper, we formulate a new generalized reference kernel hoping to improve the original base kernel using a set of reference vectors. Depending on the selected reference vectors, our formulation shows similarities to approximate kernels, random mappings, and Non-linear Projection Trick. Focusing on small-scale one-class classification, our analysis and experimental results show that the new formulation provides approaches to regularize, adjust the rank, and incorporate additional information into the kernel itself, leading to improved one-class classification accuracy.  ( 2 min )
    Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness. (arXiv:2205.00001v2 [cs.AI] UPDATED)
    Having a rich multimodal inner language is an important component of human intelligence that enables several necessary core cognitive functions such as multimodal prediction, translation, and generation. Building upon the Conscious Turing Machine (CTM), a machine model for consciousness proposed by Blum and Blum (2021), we describe the desiderata of a multimodal language called Brainish, comprising words, images, audio, and sensations combined in representations that the CTM's processors use to communicate with each other. We define the syntax and semantics of Brainish before operationalizing this language through the lens of multimodal artificial intelligence, a vibrant research area studying the computational tools necessary for processing and relating information from heterogeneous signals. Our general framework for learning Brainish involves designing (1) unimodal encoders to segment and represent unimodal data, (2) a coordinated representation space that relates and composes unimodal features to derive holistic meaning across multimodal inputs, and (3) decoders to map multimodal representations into predictions (for fusion) or raw data (for translation or generation). Through discussing how Brainish is crucial for communication and coordination in order to achieve consciousness in the CTM, and by implementing a simple version of Brainish and evaluating its capability of demonstrating intelligence on multimodal prediction and retrieval tasks on several real-world image, text, and audio datasets, we argue that such an inner language will be important for advances in machine models of intelligence and consciousness.  ( 2 min )
    Sparse Representations of Positive Functions via First and Second-Order Pseudo-Mirror Descent. (arXiv:2011.07142v4 [stat.ML] UPDATED)
    We consider expected risk minimization problems when the range of the estimator is required to be nonnegative, motivated by the settings of maximum likelihood estimation (MLE) and trajectory optimization. To facilitate nonlinear interpolation, we hypothesize that the search space is a Reproducing Kernel Hilbert Space (RKHS). We develop first and second-order variants of stochastic mirror descent employing (i) \emph{pseudo-gradients} and (ii) complexity-reducing projections. Compressive projection in the first-order scheme is executed via kernel orthogonal matching pursuit (KOMP), which overcomes the fact that the vanilla RKHS parameterization grows unbounded with the iteration index in the stochastic setting. Moreover, pseudo-gradients are needed when gradient estimates for cost are only computable up to some numerical error, which arise in, e.g., integral approximations. Under constant step-size and compression budget, we establish tradeoffs between the radius of convergence of the expected sub-optimality and the projection budget parameter, as well as non-asymptotic bounds on the model complexity. To refine the solution's precision, we develop a second-order extension which employs recursively averaged pseudo-gradient outer-products to approximate the Hessian inverse, whose convergence in mean is established under an additional eigenvalue decay condition on the Hessian of the optimal RKHS element, which is unique to this work. Experiments demonstrate favorable performance on inhomogeneous Poisson Process intensity estimation in practice.
    Wavelet neural operator: a neural operator for parametric partial differential equations. (arXiv:2205.02191v1 [physics.comp-ph])
    With massive advancements in sensor technologies and Internet-of-things, we now have access to terabytes of historical data; however, there is a lack of clarity in how to best exploit the data to predict future events. One possible alternative in this context is to utilize operator learning algorithm that directly learn nonlinear mapping between two functional spaces; this facilitates real-time prediction of naturally arising complex evolutionary dynamics. In this work, we introduce a novel operator learning algorithm referred to as the Wavelet Neural Operator (WNO) that blends integral kernel with wavelet transformation. WNO harnesses the superiority of the wavelets in time-frequency localization of the functions and enables accurate tracking of patterns in spatial domain and effective learning of the functional mappings. Since the wavelets are localized in both time/space and frequency, WNO can provide high spatial and frequency resolution. This offers learning of the finer details of the parametric dependencies in the solution for complex problems. The efficacy and robustness of the proposed WNO are illustrated on a wide array of problems involving Burger's equation, Darcy flow, Navier-Stokes equation, Allen-Cahn equation, and Wave advection equation. Comparative study with respect to existing operator learning frameworks are presented. Finally, the proposed approach is used to build a digital twin capable of predicting Earth's air temperature based on available historical data.
    SMLT: A Serverless Framework for Scalable and Adaptive Machine Learning Design and Training. (arXiv:2205.01853v1 [cs.DC])
    In today's production machine learning (ML) systems, models are continuously trained, improved, and deployed. ML design and training are becoming a continuous workflow of various tasks that have dynamic resource demands. Serverless computing is an emerging cloud paradigm that provides transparent resource management and scaling for users and has the potential to revolutionize the routine of ML design and training. However, hosting modern ML workflows on existing serverless platforms has non-trivial challenges due to their intrinsic design limitations such as stateless nature, limited communication support across function instances, and limited function execution duration. These limitations result in a lack of an overarching view and adaptation mechanism for training dynamics and an amplification of existing problems in ML workflows. To address the above challenges, we propose SMLT, an automated, scalable, and adaptive serverless framework to enable efficient and user-centric ML design and training. SMLT employs an automated and adaptive scheduling mechanism to dynamically optimize the deployment and resource scaling for ML tasks during training. SMLT further enables user-centric ML workflow execution by supporting user-specified training deadlines and budget limits. In addition, by providing an end-to-end design, SMLT solves the intrinsic problems in serverless platforms such as the communication overhead, limited function execution duration, need for repeated initialization, and also provides explicit fault tolerance for ML training. SMLT is open-sourced and compatible with all major ML frameworks. Our experimental evaluation with large, sophisticated modern ML models demonstrate that SMLT outperforms the state-of-the-art VM based systems and existing serverless ML training frameworks in both training speed (up to 8X) and monetary cost (up to 3X)
    Stochastic Coded Federated Learning with Convergence and Privacy Guarantees. (arXiv:2201.10092v4 [cs.LG] UPDATED)
    Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework, where many clients collaboratively train a machine learning model by exchanging model updates with a parameter server instead of sharing their raw data. Nevertheless, FL training suffers from slow convergence and unstable performance due to stragglers caused by the heterogeneous computational resources of clients and fluctuating communication rates. This paper proposes a coded FL framework to mitigate the straggler issue, namely stochastic coded federated learning (SCFL). In this framework, each client generates a privacy-preserving coded dataset by adding additive noise to the random linear combination of its local data. The server collects the coded datasets from all the clients to construct a composite dataset, which helps to compensate for the straggling effect. In the training process, the server as well as clients perform mini-batch stochastic gradient descent (SGD), and the server adds a make-up term in model aggregation to obtain unbiased gradient estimates. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning. Besides, we demonstrate a privacy-performance tradeoff of the proposed SCFL method by analyzing the influence of the privacy constraint on the convergence rate. Finally, numerical experiments corroborate our analysis and show the benefits of SCFL in achieving fast convergence while preserving data privacy.  ( 2 min )
    Explain to Not Forget: Defending Against Catastrophic Forgetting with XAI. (arXiv:2205.01929v1 [cs.LG])
    The ability to continuously process and retain new information like we do naturally as humans is a feat that is highly sought after when training neural networks. Unfortunately, the traditional optimization algorithms often require large amounts of data available during training time and updates wrt. new data are difficult after the training process has been completed. In fact, when new data or tasks arise, previous progress may be lost as neural networks are prone to catastrophic forgetting. Catastrophic forgetting describes the phenomenon when a neural network completely forgets previous knowledge when given new information. We propose a novel training algorithm called training by explaining in which we leverage Layer-wise Relevance Propagation in order to retain the information a neural network has already learned in previous tasks when training on new data. The method is evaluated on a range of benchmark datasets as well as more complex data. Our method not only successfully retains the knowledge of old tasks within the neural networks but does so more resource-efficiently than other state-of-the-art solutions.
    Exploring Rawlsian Fairness for K-Means Clustering. (arXiv:2205.02052v1 [cs.LG])
    We conduct an exploratory study that looks at incorporating John Rawls' ideas on fairness into existing unsupervised machine learning algorithms. Our focus is on the task of clustering, specifically the k-means clustering algorithm. To the best of our knowledge, this is the first work that uses Rawlsian ideas in clustering. Towards this, we attempt to develop a postprocessing technique i.e., one that operates on the cluster assignment generated by the standard k-means clustering algorithm. Our technique perturbs this assignment over a number of iterations to make it fairer according to Rawls' difference principle while minimally affecting the overall utility. As the first step, we consider two simple perturbation operators -- $\mathbf{R_1}$ and $\mathbf{R_2}$ -- that reassign examples in a given cluster assignment to new clusters; $\mathbf{R_1}$ assigning a single example to a new cluster, and $\mathbf{R_2}$ a pair of examples to new clusters. Our experiments on a sample of the Adult dataset demonstrate that both operators make meaningful perturbations in the cluster assignment towards incorporating Rawls' difference principle, with $\mathbf{R_2}$ being more efficient than $\mathbf{R_1}$ in terms of the number of iterations. However, we observe that there is still a need to design operators that make significantly better perturbations. Nevertheless, both operators provide good baselines for designing and comparing any future operator, and we hope our findings would aid future work in this direction.
    Depth Uncertainty Networks for Active Learning. (arXiv:2112.06796v2 [cs.LG] UPDATED)
    In active learning, the size and complexity of the training dataset changes over time. Simple models that are well specified by the amount of data available at the start of active learning might suffer from bias as more points are actively sampled. Flexible models that might be well suited to the full dataset can suffer from overfitting towards the start of active learning. We tackle this problem using Depth Uncertainty Networks (DUNs), a BNN variant in which the depth of the network, and thus its complexity, is inferred. We find that DUNs outperform other BNN variants on several active learning tasks. Importantly, we show that on the tasks in which DUNs perform best they present notably less overfitting than baselines.
    Understanding CNNs from excitations. (arXiv:2205.00932v2 [cs.CV] UPDATED)
    For instance-level explanation, in order to reveal the relations between high-level semantics and detailed spatial information, this paper proposes a novel cognitive approach to neural networks, which named PANE. Under the guidance of PANE, a novel saliency map representation method, named IOM, is proposed for CNN-like models. We make the comparison with eight state-of-the-art saliency map representation methods. The experimental results show that IOM far outperforms baselines. The work of this paper may bring a new perspective to understand deep neural networks.
    Diverse Image Captioning with Grounded Style. (arXiv:2205.01813v1 [cs.CV])
    Stylized image captioning as presented in prior work aims to generate captions that reflect characteristics beyond a factual description of the scene composition, such as sentiments. Such prior work relies on given sentiment identifiers, which are used to express a certain global style in the caption, e.g. positive or negative, however without taking into account the stylistic content of the visual scene. To address this shortcoming, we first analyze the limitations of current stylized captioning datasets and propose COCO attribute-based augmentations to obtain varied stylized captions from COCO annotations. Furthermore, we encode the stylized information in the latent space of a Variational Autoencoder; specifically, we leverage extracted image attributes to explicitly structure its sequential latent space according to different localized style characteristics. Our experiments on the Senticap and COCO datasets show the ability of our approach to generate accurate captions with diversity in styles that are grounded in the image.
    Regret-Optimal Filtering for Prediction and Estimation. (arXiv:2101.10357v3 [math.OC] UPDATED)
    The filtering problem of causally estimating a desired signal from a related observation signal is investigated through the lens of regret optimization. Classical filter designs, such as $\mathcal H_2$ (Kalman) and $\mathcal H_\infty$, minimize the average and worst-case estimation errors, respectively. As a result $\mathcal H_2$ filters are sensitive to inaccuracies in the underlying statistical model, and $\mathcal H_\infty$ filters are overly conservative since they safeguard against the worst-case scenario. We propose instead to minimize the \emph{regret} in order to design filters that perform well in different noise regimes by comparing their performance with that of a clairvoyant filter. More explicitly, we minimize the largest deviation of the squared estimation error of a causal filter from that of a non-causal filter that has access to future observations. In this sense, the regret-optimal filter will have the best competitive performance with respect to the non-causal benchmark filter no matter what the true signal and the observation process are. For the important case of signals that can be described with a time-invariant state-space, we provide an explicit construction for the regret optimal filter in the estimation (causal) and the prediction (strictly-causal) regimes. These solutions are obtained by reducing the regret filtering problem to a Nehari problem, i.e., approximating a non-causal operator by a causal one in spectral norm. The regret-optimal filters bear some resemblance to Kalman and $H_\infty$ filters: they are expressed as state-space models, inherit the finite dimension of the original state-space, and their solutions require solving algebraic Riccati equations. Numerical simulations demonstrate that regret minimization inherently interpolates between the performances of the $H_2$ and $H_\infty$ filters and is thus a viable approach for filter design.
    Making SGD Parameter-Free. (arXiv:2205.02160v1 [math.OC])
    We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.
    Ensembling Off-the-shelf Models for GAN Training. (arXiv:2112.09130v3 [cs.CV] UPDATED)
    The advent of large-scale training has produced a cornucopia of powerful visual recognition models. However, generative models, such as GANs, have traditionally been trained from scratch in an unsupervised manner. Can the collective "knowledge" from a large bank of pretrained vision models be leveraged to improve GAN training? If so, with so many models to choose from, which one(s) should be selected, and in what manner are they most effective? We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators. Notably, the particular subset of selected models greatly affects performance. We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings, choosing the most accurate model, and progressively adding it to the discriminator ensemble. Interestingly, our method can improve GAN training in both limited data and large-scale settings. Given only 10k training samples, our FID on LSUN Cat matches the StyleGAN2 trained on 1.6M images. On the full dataset, our method improves FID by 1.5x to 2x on cat, church, and horse categories of LSUN.
    AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First -- Using Relation Extraction to Identify Entities. (arXiv:2203.05325v2 [cs.CL] UPDATED)
    In this paper, we present an end-to-end joint entity and relation extraction approach based on transformer-based language models. We apply the model to the task of linking mathematical symbols to their descriptions in LaTeX documents. In contrast to existing approaches, which perform entity and relation extraction in sequence, our system incorporates information from relation extraction into entity extraction. This means that the system can be trained even on data sets where only a subset of all valid entity spans is annotated. We provide an extensive evaluation of the proposed system and its strengths and weaknesses. Our approach, which can be scaled dynamically in computational complexity at inference time, produces predictions with high precision and reaches 3rd place in the leaderboard of SemEval-2022 Task 12. For inputs in the domain of physics and math, it achieves high relation extraction macro F1 scores of 95.43% and 79.17%, respectively. The code used for training and evaluating our models is available at: https://github.com/nicpopovic/RE1st
    Inverting brain grey matter models with likelihood-free inference: a tool for trustable cytoarchitecture measurements. (arXiv:2111.08693v2 [q-bio.QM] UPDATED)
    Effective characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in diffusion MRI (dMRI). Solving the problem of relating the dMRI signal with cytoarchitectural characteristics calls for the definition of a mathematical model that describes brain tissue via a handful of physiologically-relevant parameters and an algorithm for inverting the model. To address this issue, we propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells. We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model. As opposed to other approaches from the literature, our algorithm yields not only an estimation of the parameter vector $\theta$ that best describes a given observed data point $x_0$, but also a full posterior distribution $p(\theta|x_0)$ over the parameter space. This enables a richer description of the model inversion, providing indicators such as credible intervals for the estimated parameters and a complete characterization of the parameter regions where the model may present indeterminacies. We approximate the posterior distribution using deep neural density estimators, known as normalizing flows, and fit them using a set of repeated simulations from the forward model. We validate our approach on simulations using dmipy and then apply the whole pipeline on two publicly available datasets.
    The leap to ordinal: detailed functional prognosis after traumatic brain injury with a flexible modelling approach. (arXiv:2202.04801v2 [cs.LG] UPDATED)
    When a patient is admitted to the intensive care unit (ICU) after a traumatic brain injury (TBI), an early prognosis is essential for baseline risk adjustment and shared decision making. TBI outcomes are commonly categorised by the Glasgow Outcome Scale-Extended (GOSE) into 8, ordered levels of functional recovery at 6 months after injury. Existing ICU prognostic models predict binary outcomes at a certain threshold of GOSE (e.g., prediction of survival [GOSE>1] or functional independence [GOSE>4]). We aimed to develop ordinal prediction models that concurrently predict probabilities of each GOSE score. From a prospective cohort (n=1,550, 65 centres) in the ICU stratum of the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) patient dataset, we extracted all clinical information within 24 hours of ICU admission (1,151 predictors) and 6-month GOSE scores. We analysed the effect of 2 design elements on ordinal model performance: (1) the baseline predictor set, ranging from a concise set of 10 validated predictors to a token-embedded representation of all possible predictors, and (2) the modelling strategy, from ordinal logistic regression to multinomial deep learning. With repeated k-fold cross-validation, we found that expanding the baseline predictor set significantly improved ordinal prediction performance while increasing analytical complexity did not. Half of these gains could be achieved with the addition of 8 high-impact predictors (2 demographic variables, 4 protein biomarkers, and 2 severity assessments) to the concise set. At best, ordinal models achieved 0.76 (95% CI: 0.74-0.77) ordinal discrimination ability (ordinal c-index) and 57% (95% CI: 54%-60%) explanation of ordinal variation in 6-month GOSE (Somers' D). Our results motivate the search for informative predictors for higher GOSE and the development of ordinal dynamic prediction models.
    On Deep Neural Network Calibration by Regularization and its Impact on Refinement. (arXiv:2106.09385v3 [cs.LG] UPDATED)
    Deep neural networks have been shown to be highly miscalibrated. often they tend to be overconfident in their predictions. It poses a significant challenge for safety-critical systems to utilise deep neural networks (DNNs), reliably. Many recently proposed approaches to mitigate this have demonstrated substantial progress in improving DNN calibration. However, they hardly touch upon refinement, which historically has been an essential aspect of calibration. Refinement indicates separability of a network's correct and incorrect predictions. This paper presents a theoretically and empirically supported exposition reviewing refinement of a calibrated model. Firstly, we show the breakdown of expected calibration error (ECE), into predicted confidence and refinement under the assumption of over-confident predictions. Secondly, linking with this result, we highlight that regularization based calibration only focuses on naively reducing a model's confidence. This logically has a severe downside to a model's refinement as correct and incorrect predictions become tightly coupled. Lastly, connecting refinement with ECE also provides support to existing refinement based approaches which improve calibration but do not explain the reasoning behind it. We support our claims through rigorous empirical evaluations of many state of the art calibration approaches on widely used datasets and neural networks. We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the usefulness of a DNN by degrading its refinement. Even under natural data shift, this calibration-refinement trade-off holds for the majority of calibration methods.  ( 2 min )
    Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models. (arXiv:2110.04478v2 [cs.DC] UPDATED)
    Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activation, depending on the parallelization strategy. In next-generation platforms for training at scale, NPUs will be connected through multi-dimensional networks with diverse, heterogeneous bandwidths. This work identifies a looming challenge of keeping all network dimensions busy and maximizing the network BW within the hybrid environment if we leverage scheduling techniques for collective communication on systems today. We propose Themis, a novel collective scheduling scheme that dynamically schedules collectives (divided into chunks) to balance the communication loads across all dimensions, further improving the network BW utilization. Our results show that on average, Themis can improve the network BW utilization of the single All-Reduce by 1.72X (2.70X max), and improve the end-to-end training iteration performance of real workloads such as ResNet-152, GNMT, DLRM, and Transformer-1T by 1.49X (2.25X max), 1.30X (1.78X max), 1.30X (1.77X max), and 1.25X (1.53X max), respectively.
    PocketNN: Integer-only Training and Inference of Neural Networks via Direct Feedback Alignment and Pocket Activations in Pure C++. (arXiv:2201.02863v6 [cs.LG] UPDATED)
    Standard deep learning algorithms are implemented using floating-point real numbers. This presents an obstacle for implementing them on low-end devices which may not have dedicated floating-point units (FPUs). As a result, researchers in tinyML have considered machine learning algorithms that can train and run a deep neural network (DNN) on a low-end device using integer operations only. In this paper we propose PocketNN, a light and self-contained proof-of-concept framework in pure C++ for the training and inference of DNNs using only integers. Unlike other approaches, PocketNN directly operates on integers without requiring any explicit quantization algorithms or customized fixed-point formats. This was made possible by pocket activations, which are a family of activation functions devised for integer-only DNNs, and an emerging DNN training algorithm called direct feedback alignment (DFA). Unlike the standard backpropagation (BP), DFA trains each layer independently, thus avoiding integer overflow which is a key problem when using BP with integer-only operations. We used PocketNN to train some DNNs on two well-known datasets, MNIST and Fashion-MNIST. Our experiments show that the DNNs trained with our PocketNN achieved 96.98% and 87.7% accuracies on MNIST and Fashion-MNIST datasets, respectively. The accuracies are very close to the equivalent DNNs trained using BP with floating-point real number operations, such that accuracy degradations were just 1.02%p and 2.09%p, respectively. Finally, our PocketNN has high compatibility and portability for low-end devices as it is open source and implemented in pure C++ without any dependencies.
    Negative Sampling in Variational Autoencoders. (arXiv:1910.02760v3 [cs.LG] UPDATED)
    Modern deep artificial neural networks have achieved great success in the domain of computer vision and beyond. However, their application to many real-world tasks is undermined by certain limitations, such as overconfident uncertainty estimates on out-of-distribution data or performance deterioration under data distribution shifts. Several types of deep learning models used for density estimation through probabilistic generative modeling have been shown to fail to detect out-of-distribution samples by assigning higher likelihoods to anomalous data. We investigate this failure mode in Variational Autoencoder models, which are also prone to this, and improve upon the out-of-distribution generalization performance of the model by employing an alternative training scheme utilizing negative samples. We present a fully unsupervised version: when the model is trained in an adversarial manner, the generator's own outputs can be used as negative samples. We demonstrate empirically the effectiveness of the approach in reducing the overconfident likelihood estimates of out-of-distribution inputs on image data.
    Evaluating Transferability for Covid 3D Localization Using CT SARS-CoV-2 segmentation models. (arXiv:2205.02152v1 [eess.IV])
    Recent studies indicate that detecting radiographic patterns on CT scans can yield high sensitivity and specificity for COVID-19 localization. In this paper, we investigate the appropriateness of deep learning models transferability, for semantic segmentation of pneumonia-infected areas in CT images. Transfer learning allows for the fast initialization/ reutilization of detection models, given that large volumes of training are not available. Our work explores the efficacy of using pre-trained U-Net architectures, on a specific CT data set, for identifying Covid-19 side-effects over images from different datasets. Experimental results indicate improvement in the segmentation accuracy of identifying COVID-19 infected regions.  ( 2 min )
    Multistage linguistic conditioning of convolutional layers for speech emotion recognition. (arXiv:2110.06650v2 [cs.LG] UPDATED)
    In this contribution, we investigate the effectiveness of deep fusion of text and audio features for categorical and dimensional speech emotion recognition (SER). We propose a novel, multistage fusion method where the two information streams are integrated in several layers of a deep neural network (DNN), and contrast it with a single-stage one where the streams are merged in a single point. Both methods depend on extracting summary linguistic embeddings from a pre-trained BERT model, and conditioning one or more intermediate representations of a convolutional model operating on log-Mel spectrograms. Experiments on the MSP-Podcast and IEMOCAP datasets demonstrate that the two fusion methods clearly outperform a shallow (late) fusion baseline and their unimodal constituents, both in terms of quantitative performance and qualitative behaviour. Overall, our multistage fusion shows better quantitative performance, surpassing alternatives on most of our evaluations. This illustrates the potential of multistage fusion in better assimilating text and audio information.  ( 2 min )
    Informativeness and Invariance: Two Perspectives on Spurious Correlations in Natural Language. (arXiv:2204.04487v2 [cs.CL] UPDATED)
    Spurious correlations are a threat to the trustworthiness of natural language processing systems, motivating research into methods for identifying and eliminating them. However, addressing the problem of spurious correlations requires more clarity on what they are and how they arise in language data. Gardner et al (2021) argue that the compositional nature of language implies that \emph{all} correlations between labels and individual "input features" are spurious. This paper analyzes this proposal in the context of a toy example, demonstrating three distinct conditions that can give rise to feature-label correlations in a simple PCFG. Linking the toy example to a structured causal model shows that (1) feature-label correlations can arise even when the label is invariant to interventions on the feature, and (2) feature-label correlations may be absent even when the label is sensitive to interventions on the feature. Because input features will be individually correlated with labels in all but very rare circumstances, domain knowledge must be applied to identify spurious correlations that pose genuine robustness threats.
    Zero-Shot Text-Guided Object Generation with Dream Fields. (arXiv:2112.01455v2 [cs.CV] UPDATED)
    We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. Due to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a handful of categories, such as ShapeNet. Instead, we guide generation with image-text models pre-trained on large datasets of captioned images from the web. Our method optimizes a Neural Radiance Field from many camera views so that rendered images score highly with a target caption according to a pre-trained CLIP model. To improve fidelity and visual quality, we introduce simple geometric priors, including sparsity-inducing transmittance regularization, scene bounds, and new MLP architectures. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.  ( 2 min )
    An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks. (arXiv:2203.16773v2 [eess.AS] UPDATED)
    Speech representations learned from Self-supervised learning (SSL) models can benefit various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. Recently, prompting in Natural Language Processing (NLP) has been found to be an efficient technique to leverage pre-trained language models (LMs). Specifically, prompt tuning optimizes a limited number of task-specific parameters with a fixed pre-trained model; as a result, only a small set of parameters is needed to be stored for each task. Prompt tuning improves computation and memory efficiency by leveraging the pre-trained LM's prediction ability. Nevertheless, such a paradigm is little studied in the speech community. We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM). Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models. We further study the technique in challenging sequence generation tasks. Prompt tuning also demonstrates its potential, while the limitation and possible research directions are discussed in this paper. The source code is available on https://github.com/ga642381/SpeechPrompt.
    Local versions of sum-of-norms clustering. (arXiv:2109.09589v2 [cs.LG] UPDATED)
    Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.
    FederatedScope: A Flexible Federated Learning Platform for Heterogeneity. (arXiv:2204.05011v3 [cs.LG] UPDATED)
    Although remarkable progress has been made by the existing federated learning (FL) platforms to provide fundamental functionalities for development, these platforms cannot well tackle the challenges brought by the heterogeneity of FL scenarios from both academia and industry. To fill this gap, in this paper, we propose a flexible federated learning platform, named FederatedScope, for handling various types of heterogeneity in FL. Considering both flexibility and extensibility, FederatedScope adopts an event-driven architecture to frame an FL course into event-handler pairs: the behaviors of participants are described in handlers, and triggered by events of message passing or meeting certain conditions in training. For a new FL application, developers only need to specify the adopted FL algorithm by defining new types of events and the corresponding handling functions based on participants' behaviors, which would be automatically executed in an asynchronous way for balancing effectiveness and efficiency in FederatedScope. Meanwhile, towards an easy-to-use platform, FederatedScope provides rich built-in algorithms, including personalization, federated aggregation, privacy protection, and privacy attack, for users to conveniently customize participant-specific training, fusing, aggregating, and protecting. Besides, a federated hyperparameter optimization module is integrated into FederatedScope for users to automatically tune their FL systems for resolving the unstable issues brought by heterogeneity. We conduct a series of experiments on the provided easy-to-use and comprehensive FL benchmarks to validate the correctness and efficiency of FederatedScope. We have released FederatedScope for users on https://github.com/alibaba/FederatedScope to promote research and industrial deployment of federated learning in a variety of real-world applications.
    Efficient Accelerator for Dilated and Transposed Convolution with Decomposition. (arXiv:2205.02103v1 [cs.AR])
    Hardware acceleration for dilated and transposed convolution enables real time execution of related tasks like segmentation, but current designs are specific for these convolutional types or suffer from complex control for reconfigurable designs. This paper presents a design that decomposes input or weight for dilated and transposed convolutions respectively to skip redundant computations and thus executes efficiently on existing dense CNN hardware as well. The proposed architecture can cut down 87.8\% of the cycle counts to achieve 8.2X speedup over a naive execution for the ENet case.
    Music-to-Dance Generation with Optimal Transport. (arXiv:2112.01806v2 [cs.SD] UPDATED)
    Dance choreography for a piece of music is a challenging task, having to be creative in presenting distinctive stylistic dance elements while taking into account the musical theme and rhythm. It has been tackled by different approaches such as similarity retrieval, sequence-to-sequence modeling and generative adversarial networks, but their generated dance sequences are often short of motion realism, diversity and music consistency. In this paper, we propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographies from music. We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music. This gives a well defined and non-divergent training objective that mitigates the limitation of standard GAN training which is frequently plagued with instability and divergent generator loss issues. Extensive experiments demonstrate that our MDOT-Net can synthesize realistic and diverse dances which achieve an organic unity with the input music, reflecting the shared intentionality and matching the rhythmic articulation. Sample results are found at https://www.youtube.com/watch?v=dErfBkrlUO8.
    UserIdentifier: Implicit User Representations for Simple and Effective Personalized Sentiment Analysis. (arXiv:2110.00135v2 [cs.LG] UPDATED)
    Global models are trained to be as generalizable as possible, with user invariance considered desirable since the models are shared across multitudes of users. As such, these models are often unable to produce personalized responses for individual users, based on their data. Contrary to widely-used personalization techniques based on few-shot learning, we propose UserIdentifier, a novel scheme for training a single shared model for all users. Our approach produces personalized responses by adding fixed, non-trainable user identifiers to the input data. We empirically demonstrate that this proposed method outperforms the prefix-tuning based state-of-the-art approach by up to 13%, on a suite of sentiment analysis datasets. We also show that, unlike prior work, this method needs neither any additional model parameters nor any extra rounds of few-shot fine-tuning.
    Differentiable Time-Frequency Scattering in Kymatio. (arXiv:2204.08269v3 [cs.SD] UPDATED)
    Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Kymatio, an open-source Python package for scattering transforms. Unlike prior implementations, Kymatio accommodates NumPy and PyTorch as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS in Kymatio via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds.  ( 2 min )
    CausalNLP: A Practical Toolkit for Causal Inference with Text. (arXiv:2106.08043v4 [cs.CL] UPDATED)
    Causal inference is the process of estimating the effect or impact of a treatment on an outcome with other covariates as potential confounders (and mediators) that may need to be controlled. The vast majority of existing methods and systems for causal inference assume that all variables under consideration are categorical or numerical (e.g., gender, price, enrollment). In this paper, we present CausalNLP, a toolkit for inferring causality with observational data that includes text in addition to traditional numerical and categorical variables. CausalNLP employs the use of meta learners for treatment effect estimation and supports using raw text and its linguistic properties as a treatment, an outcome, or a "controlled-for" variable (e.g., confounder). The library is open source and available at: https://github.com/amaiya/causalnlp.  ( 2 min )
    Visual Similarity Attention. (arXiv:1911.07381v2 [cs.CV] UPDATED)
    While there has been substantial progress in learning suitable distance metrics, these techniques in general lack transparency and decision reasoning, i.e., explaining why the input set of images is similar or dissimilar. In this work, we solve this key problem by proposing the first method to generate generic visual similarity explanations with gradient-based attention. We demonstrate that our technique is agnostic to the specific similarity model type, e.g., we show applicability to Siamese, triplet, and quadruplet models. Furthermore, we make our proposed similarity attention a principled part of the learning process, resulting in a new paradigm for learning similarity functions. We demonstrate that our learning mechanism results in more generalizable, as well as explainable, similarity models. Finally, we demonstrate the generality of our framework by means of experiments on a variety of tasks, including image retrieval, person re-identification, and low-shot semantic segmentation.  ( 2 min )
    DiCOVA-Net: Diagnosing COVID-19 using Acoustics based on Deep Residual Network for the DiCOVA Challenge 2021. (arXiv:2107.06126v2 [cs.SD] UPDATED)
    In this paper, we propose a deep residual network-based method, namely the DiCOVA-Net, to identify COVID-19 infected patients based on the acoustic recording of their coughs. Since there are far more healthy people than infected patients, this classification problem faces the challenge of imbalanced data. To improve the model's ability to recognize minority class (the infected patients), we introduce data augmentation and cost-sensitive methods into our model. Besides, considering the particularity of this task, we deploy some fine-tuning techniques to adjust the pre-training ResNet50. Furthermore, to improve the model's generalizability, we use ensemble learning to integrate prediction results from multiple base classifiers generated using different random seeds. To evaluate the proposed DiCOVA-Net's performance, we conducted experiments with the DiCOVA challenge dataset. The results show that our method has achieved 85.43\% in AUC, among the top of all competing teams.  ( 2 min )
    When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer. (arXiv:2110.14782v3 [cs.CL] UPDATED)
    While recent work on multilingual language models has demonstrated their capacity for cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the community as to what shared properties between languages enable such transfer. Analyses involving pairs of natural languages are often inconclusive and contradictory since languages simultaneously differ in many linguistic aspects. In this paper, we perform a large-scale empirical study to isolate the effects of various linguistic properties by measuring zero-shot transfer between four diverse natural languages and their counterparts constructed by modifying aspects such as the script, word order, and syntax. Among other things, our experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages (e.g., R=0.94 on the task of NLI). Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages rather than relying on its implicit emergence.
    Towards All-around Knowledge Transferring: Learning From Task-irrelevant Labels. (arXiv:2011.08470v2 [cs.LG] UPDATED)
    Deep neural models have hitherto achieved significant performances on numerous classification tasks, but meanwhile require sufficient manually annotated data. Since it is extremely time-consuming and expensive to annotate adequate data for each classification task, learning an empirically effective model with generalization on small dataset has received increased attention. Existing efforts mainly focus on transferring task-relevant knowledge from other similar data to tackle the issue. These approaches have yielded remarkable improvements, yet neglecting the fact that the task-irrelevant features could bring out massive negative transfer effects. To date, no large-scale studies have been performed to investigate the impact of task-irrelevant features, let alone the utilization of this kind of features. In this paper, we firstly propose Task-Irrelevant Transfer Learning (TIRTL) to exploit task-irrelevant features, which mainly are extracted from task-irrelevant labels. Particularly, we suppress the expression of task-irrelevant information and facilitate the learning process of classification. We also provide a theoretical explanation of our method. In addition, TIRTL does not conflict with those that have previously exploited task-relevant knowledge and can be well combined to enable the simultaneous utilization of task-relevant and task-irrelevant features for the first time. In order to verify the effectiveness of our theory and method, we conduct extensive experiments on facial expression recognition and digit recognition tasks. Our source code will be also available in the future for reproducibility.
    Optimizing One-pixel Black-box Adversarial Attacks. (arXiv:2205.02116v1 [cs.CR])
    The output of Deep Neural Networks (DNN) can be altered by a small perturbation of the input in a black box setting by making multiple calls to the DNN. However, the high computation and time required makes the existing approaches unusable. This work seeks to improve the One-pixel (few-pixel) black-box adversarial attacks to reduce the number of calls to the network under attack. The One-pixel attack uses a non-gradient optimization algorithm to find pixel-level perturbations under the constraint of a fixed number of pixels, which causes the network to predict the wrong label for a given image. We show through experimental results how the choice of the optimization algorithm and initial positions to search can reduce function calls and increase attack success significantly, making the attack more practical in real-world settings.
    Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning. (arXiv:2205.01992v1 [cs.LG])
    The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones, assuming that it is sufficiently representative of the data that will be encountered at test time. This assumption is challenged by the threat of poisoning, an attack that manipulates the training data to compromise the model's performance at test time. Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 200 papers published in the field in the last 15 years. We start by categorizing the current threat models and attacks, and then organize existing defenses accordingly. While we focus mostly on computer-vision applications, we argue that our systematization also encompasses state-of-the-art attacks and defenses for other data modalities. Finally, we discuss existing resources for research in poisoning, and shed light on the current limitations and open research questions in this research field.
    Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification. (arXiv:2102.07711v2 [cs.LG] UPDATED)
    We study bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret $O(\log T)$ can be forced to suffer a regret $\Omega(T)$ with an expected amount of contamination $O(\log T)$. This amount of contamination is also necessary, as we prove that there exists an $O(\log T)$ regret bandit algorithm, specifically the classical UCB, that requires $\Omega(\log T)$ amount of contamination to suffer regret $\Omega(T)$. To combat such attacks, our second main contribution is to propose verification based mechanisms, which use limited verification to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with $O(\log T)$ expected number of verifications, a simple modified version of the ETC type bandit algorithm can restore the order optimal $O(\log T)$ regret irrespective of the amount of contamination used by the attacker. We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with $O(\log T)$ expected number of verifications. To derive a matching lower bound on the number of verifications, we prove that for any order-optimal bandit algorithm, this number of verifications $\Omega(\log T)$ is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget $B$, we propose a novel algorithm, Secure-BARBAR, which provably achieves $O(\min\{C,T/\sqrt{B} \})$ regret with high probability against weak attackers where $C$ is the total amount of contamination by the attacker, which breaks the known $\Omega(C)$ lower bound of the non-verified setting if $C$ is large.  ( 3 min )
    Virtual Analog Modeling of Distortion Circuits Using Neural Ordinary Differential Equations. (arXiv:2205.01897v1 [eess.AS])
    Recent research in deep learning has shown that neural networks can learn differential equations governing dynamical systems. In this paper, we adapt this concept to Virtual Analog (VA) modeling to learn the ordinary differential equations (ODEs) governing the first-order and the second-order diode clipper. The proposed models achieve performance comparable to state-of-the-art recurrent neural networks (RNNs) albeit using fewer parameters. We show that this approach does not require oversampling and allows to increase the sampling rate after the training has completed, which results in increased accuracy. Using a sophisticated numerical solver allows to increase the accuracy at the cost of slower processing. ODEs learned this way do not require closed forms but are still physically interpretable.  ( 2 min )
    DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs. (arXiv:2205.01938v1 [cs.SE])
    As Deep Learning (DL) systems are widely deployed for mission-critical applications, debugging such systems becomes essential. Most existing works identify and repair suspicious neurons on the trained Deep Neural Network (DNN), which, unfortunately, might be a detour. Specifically, several existing studies have reported that many unsatisfactory behaviors are actually originated from the faults residing in DL programs. Besides, locating faulty neurons is not actionable for developers, while locating the faulty statements in DL programs can provide developers with more useful information for debugging. Though a few recent studies were proposed to pinpoint the faulty statements in DL programs or the training settings (e.g. too large learning rate), they were mainly designed based on predefined rules, leading to many false alarms or false negatives, especially when the faults are beyond their capabilities. In view of these limitations, in this paper, we proposed DeepFD, a learning-based fault diagnosis and localization framework which maps the fault localization task to a learning problem. In particular, it infers the suspicious fault types via monitoring the runtime features extracted during DNN model training and then locates the diagnosed faults in DL programs. It overcomes the limitations by identifying the root causes of faults in DL programs instead of neurons and diagnosing the faults by a learning approach instead of a set of hard-coded rules. The evaluation exhibits the potential of DeepFD. It correctly diagnoses 52% faulty DL programs, compared with around half (27%) achieved by the best state-of-the-art works. Besides, for fault localization, DeepFD also outperforms the existing works, correctly locating 42% faulty programs, which almost doubles the best result (23%) achieved by the existing works.  ( 2 min )
    Dynamic Sparse R-CNN. (arXiv:2205.02101v1 [cs.CV])
    Sparse R-CNN is a recent strong object detection baseline by set prediction on sparse, learnable proposal boxes and proposal features. In this work, we propose to improve Sparse R-CNN with two dynamic designs. First, Sparse R-CNN adopts a one-to-one label assignment scheme, where the Hungarian algorithm is applied to match only one positive sample for each ground truth. Such one-to-one assignment may not be optimal for the matching between the learned proposal boxes and ground truths. To address this problem, we propose dynamic label assignment (DLA) based on the optimal transport algorithm to assign increasing positive samples in the iterative training stages of Sparse R-CNN. We constrain the matching to be gradually looser in the sequential stages as the later stage produces the refined proposals with improved precision. Second, the learned proposal boxes and features remain fixed for different images in the inference process of Sparse R-CNN. Motivated by dynamic convolution, we propose dynamic proposal generation (DPG) to assemble multiple proposal experts dynamically for providing better initial proposal boxes and features for the consecutive training stages. DPG thereby can derive sample-dependent proposal boxes and features for inference. Experiments demonstrate that our method, named Dynamic Sparse R-CNN, can boost the strong Sparse R-CNN baseline with different backbones for object detection. Particularly, Dynamic Sparse R-CNN reaches the state-of-the-art 47.2% AP on the COCO 2017 validation set, surpassing Sparse R-CNN by 2.2% AP with the same ResNet-50 backbone.
    Understanding and Preventing Capacity Loss in Reinforcement Learning. (arXiv:2204.09560v2 [cs.LG] UPDATED)
    The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents: \textit{capacity loss}, whereby networks trained on a sequence of target values lose their ability to quickly update their predictions over time. We demonstrate that capacity loss occurs in a range of RL agents and environments, and is particularly damaging to performance in sparse-reward tasks. We then present a simple regularizer, Initial Feature Regularization (InFeR), that mitigates this phenomenon by regressing a subspace of features towards its value at initialization, leading to significant performance improvements in sparse-reward environments such as Montezuma's Revenge. We conclude that preventing capacity loss is crucial to enable agents to maximally benefit from the learning signals they obtain throughout the entire training trajectory.
    ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters. (arXiv:2205.01906v1 [cs.GR])
    The incredible feats of athleticism demonstrated by humans are made possible in part by a vast repertoire of general-purpose motor skills, acquired through years of practice and experience. These skills not only enable humans to perform complex tasks, but also provide powerful priors for guiding their behaviors when learning new tasks. This is in stark contrast to what is common practice in physics-based character animation, where control policies are most typically trained from scratch for each task. In this work, we present a large-scale data-driven framework for learning versatile and reusable skill embeddings for physically simulated characters. Our approach combines techniques from adversarial imitation learning and unsupervised reinforcement learning to develop skill embeddings that produce life-like behaviors, while also providing an easy to control representation for use on new downstream tasks. Our models can be trained using large datasets of unstructured motion clips, without requiring any task-specific annotation or segmentation of the motion data. By leveraging a massively parallel GPU-based simulator, we are able to train skill embeddings using over a decade of simulated experiences, enabling our model to learn a rich and versatile repertoire of skills. We show that a single pre-trained model can be effectively applied to perform a diverse set of new tasks. Our system also allows users to specify tasks through simple reward functions, and the skill embedding then enables the character to automatically synthesize complex and naturalistic strategies in order to achieve the task objectives.  ( 2 min )
    Explain and Conquer: Personalised Text-based Reviews to Achieve Transparency. (arXiv:2205.01759v1 [cs.LG])
    There are many contexts where dyadic data is present. Social networking is a well-known example, where transparency has grown on importance. In these contexts, pairs of items are linked building a network where interactions play a crucial role. Explaining why these relationships are established is core to address transparency. These explanations are often presented using text, thanks to the spread of the natural language understanding tasks. We have focused on the TripAdvisor platform, considering the applicability to other dyadic data contexts. The items are a subset of users and restaurants and the interactions the reviews posted by these users. Our aim is to represent and explain pairs (user, restaurant) established by agents (e.g., a recommender system or a paid promotion mechanism), so that personalisation is taken into account. We propose the PTER (Personalised TExt-based Reviews) model. We predict, from the available reviews for a given restaurant, those that fit to the specific user interactions. PTER leverages the BERT (Bidirectional Encoders Representations from Transformers) language model. We customised a deep neural network following the feature-based approach. The performance metrics show the validity of our labelling proposal. We defined an evaluation framework based on a clustering process to assess our personalised representation. PTER clearly outperforms the proposed adversary in 5 of the 6 datasets, with a minimum ratio improvement of 4%.
    Graph Self-Supervised Learning: A Survey. (arXiv:2103.00111v5 [cs.LG] UPDATED)
    Deep learning on graphs has attracted significant interests recently. However, most of the works have focused on (semi-) supervised learning, resulting in shortcomings including heavy label reliance, poor generalization, and weak robustness. To address these issues, self-supervised learning (SSL), which extracts informative knowledge through well-designed pretext tasks without relying on manual labels, has become a promising and trending learning paradigm for graph data. Different from SSL on other domains like computer vision and natural language processing, SSL on graphs has an exclusive background, design ideas, and taxonomies. Under the umbrella of graph self-supervised learning, we present a timely and comprehensive review of the existing approaches which employ SSL techniques for graph data. We construct a unified framework that mathematically formalizes the paradigm of graph SSL. According to the objectives of pretext tasks, we divide these approaches into four categories: generation-based, auxiliary property-based, contrast-based, and hybrid approaches. We further describe the applications of graph SSL across various research fields and summarize the commonly used datasets, evaluation benchmark, performance comparison and open-source codes of graph SSL. Finally, we discuss the remaining challenges and potential future directions in this research field.
    Efficient Few-Shot Fine-Tuning for Opinion Summarization. (arXiv:2205.02170v1 [cs.CL])
    Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples. However, in opinion summarization, large annotated datasets of reviews paired with reference summaries are not available and would be expensive to create. This calls for fine-tuning methods robust to overfitting on small datasets. In addition, generically pre-trained models are often not accustomed to the specifics of customer reviews and, after fine-tuning, yield summaries with disfluencies and semantic mistakes. To address these problems, we utilize an efficient few-shot method based on adapters which, as we show, can easily store in-domain knowledge. Instead of fine-tuning the entire model, we add adapters and pre-train them in a task-specific way on a large corpus of unannotated customer reviews, using held-out reviews as pseudo summaries. Then, fine-tune the adapters on the small available human-annotated dataset. We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets, respectively. Finally, for summary personalization, we condition on aspect keyword queries, automatically created from generic datasets. In the same vein, we pre-train the adapters in a query-based manner on customer reviews and then fine-tune them on annotated datasets. This results in better-organized summary content reflected in improved coherence and fewer redundancies.  ( 2 min )
    Improved Orientation Estimation and Detection with Hybrid Object Detection Networks for Automotive Radar. (arXiv:2205.02111v1 [cs.CV])
    This paper presents novel hybrid architectures that combine grid- and point-based processing to improve the detection performance and orientation estimation of radar-based object detection networks. Purely grid-based detection models operate on a bird's-eye-view (BEV) projection of the input point cloud. These approaches suffer from a loss of detailed information through the discrete grid resolution. This applies in particular to radar object detection, where relatively coarse grid resolutions are commonly used to account for the sparsity of radar point clouds. In contrast, point-based models are not affected by this problem as they continuously process point clouds. However, they generally exhibit worse detection performances than grid-based methods. We show that a point-based model can extract neighborhood features, leveraging the exact relative positions of points, before grid rendering. This has significant benefits for a following convolutional detection backbone. In experiments on the public nuScenes dataset our hybrid architecture achieves improvements in terms of detection performance and orientation estimates over networks from previous literature.
    Leveraging Language to Learn Program Abstractions and Search Heuristics. (arXiv:2106.11053v3 [cs.LG] UPDATED)
    Inductive program synthesis, or inferring programs from examples of desired behavior, offers a general paradigm for building interpretable, robust, and generalizable machine learning systems. Effective program synthesis depends on two key ingredients: a strong library of functions from which to build programs, and an efficient search strategy for finding programs that solve a given task. We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization on three domains -- string editing, image composition, and abstract reasoning about scenes -- even when no natural language hints are available at test time.
    State Representation Learning for Goal-Conditioned Reinforcement Learning. (arXiv:2205.01965v1 [cs.LG])
    This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum number of actions needed to transition between them. Compared to previous methods, our approach does not require any domain knowledge, learning from offline and unlabeled data. We show how this representation can be leveraged to learn goal-conditioned policies, providing a notion of similarity between states and goals and a useful heuristic distance to guide planning and reinforcement learning algorithms. Finally, we empirically validate our method in classic control domains and multi-goal environments, demonstrating that our method can successfully learn representations in large and/or continuous domains.
    Few-Shot Backdoor Attacks on Visual Object Tracking. (arXiv:2201.13178v2 [cs.CV] UPDATED)
    Visual object tracking (VOT) has been widely adopted in mission-critical applications, such as autonomous driving and intelligent surveillance systems. In current practice, third-party resources such as datasets, backbone networks, and training platforms are frequently used to train high-performance VOT models. Whilst these resources bring certain convenience, they also introduce new security threats into VOT models. In this paper, we reveal such a threat where an adversary can easily implant hidden backdoors into VOT models by tempering with the training process. Specifically, we propose a simple yet effective few-shot backdoor attack (FSBA) that optimizes two losses alternately: 1) a \emph{feature loss} defined in the hidden feature space, and 2) the standard \emph{tracking loss}. We show that, once the backdoor is embedded into the target model by our FSBA, it can trick the model to lose track of specific objects even when the \emph{trigger} only appears in one or a few frames. We examine our attack in both digital and physical-world settings and show that it can significantly degrade the performance of state-of-the-art VOT trackers. We also show that our attack is resistant to potential defenses, highlighting the vulnerability of VOT models to potential backdoor attacks.
    Zero-shot Sonnet Generation with Discourse-level Planning and Aesthetics Features. (arXiv:2205.01821v1 [cs.CL])
    Poetry generation, and creative language generation in general, usually suffers from the lack of large training data. In this paper, we present a novel framework to generate sonnets that does not require training on poems. We design a hierarchical framework which plans the poem sketch before decoding. Specifically, a content planning module is trained on non-poetic texts to obtain discourse-level coherence; then a rhyme module generates rhyme words and a polishing module introduces imagery and similes for aesthetics purposes. Finally, we design a constrained decoding algorithm to impose the meter-and-rhyme constraint of the generated sonnets. Automatic and human evaluation show that our multi-stage approach without training on poem corpora generates more coherent, poetic, and creative sonnets than several strong baselines.
    Learning Purified Feature Representations from Task-irrelevant Labels. (arXiv:2102.10955v2 [cs.LG] UPDATED)
    Learning an empirically effective model with generalization using limited data is a challenging task for deep neural networks. In this paper, we propose a novel learning framework called PurifiedLearning to exploit task-irrelevant features extracted from task-irrelevant labels when training models on small-scale datasets. Particularly, we purify feature representations by using the expression of task-irrelevant information, thus facilitating the learning process of classification. Our work is built on solid theoretical analysis and extensive experiments, which demonstrate the effectiveness of PurifiedLearning. According to the theory we proved, PurifiedLearning is model-agnostic and doesn't have any restrictions on the model needed, so it can be combined with any existing deep neural networks with ease to achieve better performance. The source code of this paper will be available in the future for reproducibility.
    Data Cleansing for Indoor Positioning Wi-Fi Fingerprinting Datasets. (arXiv:2205.02096v1 [eess.SP])
    Wearable and IoT devices requiring positioning and localisation services grow in number exponentially every year. This rapid growth also produces millions of data entries that need to be pre-processed prior to being used in any indoor positioning system to ensure the data quality and provide a high Quality of Service (QoS) to the end-user. In this paper, we offer a novel and straightforward data cleansing algorithm for WLAN fingerprinting radio maps. This algorithm is based on the correlation among fingerprints using the Received Signal Strength (RSS) values and the Access Points (APs)'s identifier. We use those to compute the correlation among all samples in the dataset and remove fingerprints with low level of correlation from the dataset. We evaluated the proposed method on 14 independent publicly-available datasets. As a result, an average of 14% of fingerprints were removed from the datasets. The 2D positioning error was reduced by 2.7% and 3D positioning error by 5.3% with a slight increase in the floor hit rate by 1.2% on average. Consequently, the average speed of position prediction was also increased by 14%.
    Hypercomplex Image-to-Image Translation. (arXiv:2205.02087v1 [cs.CV])
    Image-to-image translation (I2I) aims at transferring the content representation from an input domain to an output one, bouncing along different target domains. Recent I2I generative models, which gain outstanding results in this task, comprise a set of diverse deep networks each with tens of million parameters. Moreover, images are usually three-dimensional being composed of RGB channels and common neural models do not take dimensions correlation into account, losing beneficial information. In this paper, we propose to leverage hypercomplex algebra properties to define lightweight I2I generative models capable of preserving pre-existing relations among image dimensions, thus exploiting additional input information. On manifold I2I benchmarks, we show how the proposed Quaternion StarGANv2 and parameterized hypercomplex StarGANv2 (PHStarGANv2) reduce parameters and storage memory amount while ensuring high domain translation performance and good image quality as measured by FID and LPIPS scores. Full code is available at: https://github.com/ispamm/HI2I.
    BERTMap: A BERT-based Ontology Alignment System. (arXiv:2112.02682v4 [cs.AI] UPDATED)
    Ontology alignment (a.k.a ontology matching (OM)) plays a critical role in knowledge integration. Owing to the success of machine learning in many domains, it has been applied in OM. However, the existing methods, which often adopt ad-hoc feature engineering or non-contextual word embeddings, have not yet outperformed rule-based systems especially in an unsupervised setting. In this paper, we propose a novel OM system named BERTMap which can support both unsupervised and semi-supervised settings. It first predicts mappings using a classifier based on fine-tuning the contextual embedding model BERT on text semantics corpora extracted from ontologies, and then refines the mappings through extension and repair by utilizing the ontology structure and logic. Our evaluation with three alignment tasks on biomedical ontologies demonstrates that BERTMap can often perform better than the leading OM systems LogMap and AML.
    Few-Shot Document-Level Relation Extraction. (arXiv:2205.02048v1 [cs.CL])
    We present FREDo, a few-shot document-level relation extraction (FSDLRE) benchmark. As opposed to existing benchmarks which are built on sentence-level relation extraction corpora, we argue that document-level corpora provide more realism, particularly regarding none-of-the-above (NOTA) distributions. Therefore, we propose a set of FSDLRE tasks and construct a benchmark based on two existing supervised learning data sets, DocRED and sciERC. We adapt the state-of-the-art sentence-level method MNAV to the document-level and develop it further for improved domain adaptation. We find FSDLRE to be a challenging setting with interesting new characteristics such as the ability to sample NOTA instances from the support set. The data, code, and trained models are available online (https://github.com/nicpopovic/FREDo).
    Using Deep Reinforcement Learning to solve Optimal Power Flow problem with generator failures. (arXiv:2205.02108v1 [cs.LG])
    Deep Reinforcement Learning (DRL) is being used in many domains. One of the biggest advantages of DRL is that it enables the continuous improvement of a learning agent. Secondly, the DRL framework is robust and flexible enough to be applicable to problems of varying nature and domain. Presented work is evidence of using the DRL technique to solve an Optimal Power Flow (OPF) problem. Two classical algorithms have been presented to solve the OPF problem. The drawbacks of the vanilla DRL application are discussed, and an algorithm is suggested to improve the performance. Secondly, a reward function for the OPF problem is presented that enables the solution of inherent issues in DRL. Reasons for divergence and degeneration in DRL are discussed, and the correct strategy to deal with them with respect to OPF is presented.
    The Limits of Word Level Differential Privacy. (arXiv:2205.02130v1 [cs.CR])
    As the issues of privacy and trust are receiving increasing attention within the research community, various attempts have been made to anonymize textual data. A significant subset of these approaches incorporate differentially private mechanisms to perturb word embeddings, thus replacing individual words in a sentence. While these methods represent very important contributions, have various advantages over other techniques and do show anonymization capabilities, they have several shortcomings. In this paper, we investigate these weaknesses and demonstrate significant mathematical constraints diminishing the theoretical privacy guarantee as well as major practical shortcomings with regard to the protection against deanonymization attacks, the preservation of content of the original sentences as well as the quality of the language output. Finally, we propose a new method for text anonymization based on transformer based language models fine-tuned for paraphrasing that circumvents most of the identified weaknesses and also offers a formal privacy guarantee. We evaluate the performance of our method via thorough experimentation and demonstrate superior performance over the discussed mechanisms.
    The Isabelle ENIGMA. (arXiv:2205.01981v1 [cs.AI])
    We significantly improve the performance of the E automated theorem prover on the Isabelle Sledgehammer problems by combining learning and theorem proving in several ways. In particular, we develop targeted versions of the ENIGMA guidance for the Isabelle problems, targeted versions of neural premise selection, and targeted strategies for E. The methods are trained in several iterations over hundreds of thousands untyped and typed first-order problems extracted from Isabelle. Our final best single-strategy ENIGMA and premise selection system improves the best previous version of E by 25.3% in 15 seconds, outperforming also all other previous ATP and SMT systems.
    On Continual Model Refinement in Out-of-Distribution Data Streams. (arXiv:2205.02014v1 [cs.CL])
    Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting. However, existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. In response to this, we propose a new CL problem formulation dubbed continual model refinement (CMR). Compared to prior CL settings, CMR is more practical and introduces unique challenges (boundary-agnostic and non-stationary distribution shift, diverse mixtures of multiple OOD data clusters, error-centric streams, etc.). We extend several existing CL approaches to the CMR setting and evaluate them extensively. For benchmarking and analysis, we propose a general sampling algorithm to obtain dynamic OOD data streams with controllable non-stationarity, as well as a suite of metrics measuring various aspects of online performance. Our experiments and detailed analysis reveal the promise and challenges of the CMR problem, supporting that studying CMR in dynamic OOD streams can benefit the longevity of deployed NLP models in production.
    Nonstationary Bandit Learning via Predictive Sampling. (arXiv:2205.01970v1 [cs.LG])
    We propose predictive sampling as an approach to selecting actions that balance between exploration and exploitation in nonstationary bandit environments. When specialized to stationary environments, predictive sampling is equivalent to Thompson sampling. However, predictive sampling is effective across a range of nonstationary environments in which Thompson sampling suffers. We establish a general information-theoretic bound on the Bayesian regret of predictive sampling. We then specialize this bound to study a modulated Bernoulli bandit environment. Our analysis highlights a key advantage of predictive sampling over Thompson sampling: predictive sampling deprioritizes investments in exploration where acquired information will quickly become less relevant.
    Concept Activation Vectors for Generating User-Defined 3D Shapes. (arXiv:2205.02102v1 [cs.CV])
    We explore the interpretability of 3D geometric deep learning models in the context of Computer-Aided Design (CAD). The field of parametric CAD can be limited by the difficulty of expressing high-level design concepts in terms of a few numeric parameters. In this paper, we use a deep learning architectures to encode high dimensional 3D shapes into a vectorized latent representation that can be used to describe arbitrary concepts. Specifically, we train a simple auto-encoder to parameterize a dataset of complex shapes. To understand the latent encoded space, we use the idea of Concept Activation Vectors (CAV) to reinterpret the latent space in terms of user-defined concepts. This allows modification of a reference design to exhibit more or fewer characteristics of a chosen concept or group of concepts. We also test the statistical significance of the identified concepts and determine the sensitivity of a physical quantity of interest across the dataset.
    Pre-RTL DNN Hardware Evaluator With Fused Layer Support. (arXiv:2205.01729v1 [cs.AR])
    With the popularity of the deep neural network (DNN), hardware accelerators are demanded for real time execution. However, lengthy design process and fast evolving DNN models make hardware evaluation hard to meet the time to market need. This paper proposes a pre-RTL DNN hardware evaluator that supports conventional layer-by-layer processing as well as the fused layer processing for low external bandwidth requirement. The evaluator supports two state-of-the-art accelerator architectures and finds the best hardware and layer fusion group The experimental results show the layer fusion scheme can achieve 55.6% memory bandwidth reduction, 36.7% latency improvement and 49.2% energy reduction compared with layer-by-layer operation.
    Recurrent Flow Networks: A Recurrent Latent Variable Model for Density Modelling of Urban Mobility. (arXiv:2006.05256v2 [stat.ML] UPDATED)
    Mobility-on-demand (MoD) systems represent a rapidly developing mode of transportation wherein travel requests are dynamically handled by a coordinated fleet of vehicles. Crucially, the efficiency of an MoD system highly depends on how well supply and demand distributions are aligned in spatio-temporal space (i.e., to satisfy user demand, cars have to be available in the correct place and at the desired time). To do so, we argue that predictive models should aim to explicitly disentangle between temporal} and spatial variability in the evolution of urban mobility demand. However, current approaches typically ignore this distinction by either treating both sources of variability jointly, or completely ignoring their presence in the first place. In this paper, we propose recurrent flow networks (RFN), where we explore the inclusion of (i) latent random variables in the hidden state of recurrent neural networks to model temporal variability, and (ii) normalizing flows to model the spatial distribution of mobility demand. We demonstrate how predictive models explicitly disentangling between spatial and temporal variability exhibit several desirable properties, and empirically show how this enables the generation of distributions matching potentially complex urban topologies.
    Palette: Image-to-Image Diffusion Models. (arXiv:2111.05826v2 [cs.CV] UPDATED)
    This paper develops a unified framework for image-to-image translation based on conditional diffusion models and evaluates this framework on four challenging image-to-image translation tasks, namely colorization, inpainting, uncropping, and JPEG restoration. Our simple implementation of image-to-image diffusion models outperforms strong GAN and regression baselines on all tasks, without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss or sophisticated new techniques needed. We uncover the impact of an L2 vs. L1 loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention in the neural architecture through empirical studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, with human evaluation and sample quality scores (FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against original images). We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research. Finally, we show that a generalist, multi-task diffusion model performs as well or better than task-specific specialist counterparts. Check out https://diffusion-palette.github.io for an overview of the results.
    Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing. (arXiv:2205.01875v1 [stat.ML])
    We consider the problem of dynamic pricing of a product in the presence of feature-dependent price sensitivity. Based on the Poisson semi-parametric approach, we construct a flexible yet interpretable demand model where the price related part is parametric while the remaining (nuisance) part of the model is non-parametric and can be modeled via sophisticated ML techniques. The estimation of price-sensitivity parameters of this model via direct one-stage regression techniques may lead to biased estimates. We propose a two-stage estimation methodology which makes the estimation of the price-sensitivity parameters robust to biases in the nuisance parameters of the model. In the first-stage we construct the estimators of observed purchases and price given the feature vector using sophisticated ML estimators like deep neural networks. Utilizing the estimators from the first-stage, in the second-stage we leverage a Bayesian dynamic generalized linear model to estimate the price-sensitivity parameters. We test the performance of the proposed estimation schemes on simulated and real sales transaction data from Airline industry. Our numerical studies demonstrate that the two-stage approach provides more accurate estimates of price-sensitivity parameters as compared to direct one-stage approach.
    Self-supervised learning unveils morphological clusters behind lung cancer types and prognosis. (arXiv:2205.01931v1 [cs.CV])
    Histopathological images of tumors contain abundant information about how tumors grow and how they interact with their micro-environment. Characterizing and improving our understanding of phenotypes could reveal factors related to tumor progression and their underpinning biological processes, ultimately improving diagnosis and treatment. In recent years, the field of histological deep learning applications has seen great progress, yet most of these applications focus on a supervised approach, relating tissue and associated sample annotations. Supervised approaches have their impact limited by two factors. Firstly, high-quality labels are expensive in time and effort, which makes them not easily scalable. Secondly, these methods focus on predicting annotations from histological images, fundamentally restricting the discovery of new tissue phenotypes. These limitations emphasize the importance of using new methods that can characterize tissue by the features enclosed in the image, without pre-defined annotation or supervision. We present Phenotype Representation Learning (PRL), a methodology to extract histomorphological phenotypes through self-supervised learning and community detection. PRL creates phenotype clusters by identifying tissue patterns that share common morphological and cellular features, allowing to describe whole slide images through compositional representations of cluster contributions. We used this framework to analyze histopathology slides of LUAD and LUSC lung cancer subtypes from TCGA and NYU cohorts. We show that PRL achieves a robust lung subtype prediction providing statistically relevant phenotypes for each lung subtype. We further demonstrate the significance of these phenotypes in lung adenocarcinoma overall and recurrence free survival, relating clusters with patient outcomes, cell types, grown patterns, and omic-based immune signatures.
    XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal Expression Extraction. (arXiv:2205.01757v1 [cs.CL])
    Temporal Expression Extraction (TEE) is essential for understanding time in natural language. It has applications in Natural Language Processing (NLP) tasks such as question answering, information retrieval, and causal inference. To date, work in this area has mostly focused on English as there is a scarcity of labeled data for other languages. We propose XLTime, a novel framework for multilingual TEE. XLTime works on top of pre-trained language models and leverages multi-task learning to prompt cross-language knowledge transfer both from English and within the non-English languages. XLTime alleviates problems caused by a shortage of data in the target language. We apply XLTime with different language models and show that it outperforms the previous automatic SOTA methods on French, Spanish, Portuguese, and Basque, by large margins. XLTime also closes the gap considerably on the handcrafted HeidelTime method.
    Discrete Simulation Optimization for Tuning Machine Learning Method Hyperparameters. (arXiv:2201.05978v2 [cs.LG] UPDATED)
    Machine learning (ML) methods are used in most technical areas such as image recognition, product recommendation, financial analysis, medical diagnosis, and predictive maintenance. An important aspect of implementing ML methods involves controlling the learning process for the ML method so as to maximize the performance of the method under consideration. Hyperparameter tuning is the process of selecting a suitable set of ML method parameters that control its learning process. In this work, we demonstrate the use of discrete simulation optimization methods such as ranking and selection (R&S) and random search for identifying a hyperparameter set that maximizes the performance of a ML method. Specifically, we use the KN R&S method and the stochastic ruler random search method and one of its variations for this purpose. We also construct the theoretical basis for applying the KN method, which determines the optimal solution with a statistical guarantee via solution space enumeration. In comparison, the stochastic ruler method asymptotically converges to global optima and incurs smaller computational overheads. We demonstrate the application of these methods to a wide variety of machine learning models, including deep neural network models used for time series prediction and image classification. We benchmark our application of these methods with state-of-the-art hyperparameter optimization libraries such as $hyperopt$ and $mango$. The KN method consistently outperforms $hyperopt$'s random search (RS) and Tree of Parzen Estimators (TPE) methods. The stochastic ruler method outperforms the $hyperopt$ RS method and offers statistically comparable performance with respect to $hyperopt$'s TPE method and the $mango$ algorithm.
    Growing Isotropic Neural Cellular Automata. (arXiv:2205.01681v1 [cs.NE])
    Modeling the ability of multicellular organisms to build and maintain their bodies through local interactions between individual cells (morphogenesis) is a long-standing challenge of developmental biology. Recently, the Neural Cellular Automata (NCA) model was proposed as a way to find local system rules that produce a desired global behaviour, such as growing and persisting a predefined pattern, by repeatedly applying the same rule over a grid starting from a single cell. In this work we argue that the original Growing NCA model has an important limitation: anisotropy of the learned update rule. This implies the presence of an external factor that orients the cells in a particular direction. In other words, 'physical' rules of the underlying system are not invariant to rotation, thus prohibiting the existence of differently oriented instances of the target pattern on the same grid. We propose a modified Isotropic NCA model that does not have this limitation. We demonstrate that cell systems can be trained to grow accurate asymmetrical patterns through either of two methods: by breaking symmetries using structured seeds; or by introducing a rotation-reflection invariant training objective and relying on symmetry breaking caused by asynchronous cell updates.
    fairlib: A Unified Framework for Assessing and Improving Classification Fairness. (arXiv:2205.01876v1 [cs.LG])
    This paper presents fairlib, an open-source framework for assessing and improving classification fairness. It provides a systematic framework for quickly reproducing existing baseline models, developing new methods, evaluating models with different metrics, and visualizing their results. Its modularity and extensibility enable the framework to be used for diverse types of inputs, including natural language, images, and audio. In detail, we implement 14 debiasing methods, including pre-processing, at-training-time, and post-processing approaches. The built-in metrics cover the most commonly used fairness criterion and can be further generalized and customized for fairness evaluation.
    Self-focusing virtual screening with active design space pruning. (arXiv:2205.01753v1 [q-bio.QM])
    High-throughput virtual screening is an indispensable technique utilized in the discovery of small molecules. In cases where the library of molecules is exceedingly large, the cost of an exhaustive virtual screen may be prohibitive. Model-guided optimization has been employed to lower these costs through dramatic increases in sample efficiency compared to random selection. However, these techniques introduce new costs to the workflow through the surrogate model training and inference steps. In this study, we propose an extension to the framework of model-guided optimization that mitigates inferences costs using a technique we refer to as design space pruning (DSP), which irreversibly removes poor-performing candidates from consideration. We study the application of DSP to a variety of optimization tasks and observe significant reductions in overhead costs while exhibiting similar performance to the baseline optimization. DSP represents an attractive extension of model-guided optimization that can limit overhead costs in optimization settings where these costs are non-negligible relative to objective costs, such as docking.
    Towards Theoretical Analysis of Transformation Complexity of ReLU DNNs. (arXiv:2205.01940v1 [cs.LG])
    This paper aims to theoretically analyze the complexity of feature transformations encoded in DNNs with ReLU layers. We propose metrics to measure three types of complexities of transformations based on the information theory. We further discover and prove the strong correlation between the complexity and the disentanglement of transformations. Based on the proposed metrics, we analyze two typical phenomena of the change of the transformation complexity during the training process, and explore the ceiling of a DNN's complexity. The proposed metrics can also be used as a loss to learn a DNN with the minimum complexity, which also controls the over-fitting level of the DNN and influences adversarial robustness, adversarial transferability, and knowledge consistency. Comprehensive comparative studies have provided new perspectives to understand the DNN.
    EllSeg: An Ellipse Segmentation Framework for Robust Gaze Tracking. (arXiv:2007.09600v2 [cs.CV] UPDATED)
    Ellipse fitting, an essential component in pupil or iris tracking based video oculography, is performed on previously segmented eye parts generated using various computer vision techniques. Several factors, such as occlusions due to eyelid shape, camera position or eyelashes, frequently break ellipse fitting algorithms that rely on well-defined pupil or iris edge segments. In this work, we propose training a convolutional neural network to directly segment entire elliptical structures and demonstrate that such a framework is robust to occlusions and offers superior pupil and iris tracking performance (at least 10$\%$ and 24$\%$ increase in pupil and iris center detection rate respectively within a two-pixel error margin) compared to using standard eye parts segmentation for multiple publicly available synthetic segmentation datasets.
    Self-Taught Metric Learning without Labels. (arXiv:2205.01903v1 [cs.CV])
    We present a novel self-taught framework for unsupervised metric learning, which alternates between predicting class-equivalence relations between data through a moving average of an embedding model and learning the model with the predicted relations as pseudo labels. At the heart of our framework lies an algorithm that investigates contexts of data on the embedding space to predict their class-equivalence relations as pseudo labels. The algorithm enables efficient end-to-end training since it demands no off-the-shelf module for pseudo labeling. Also, the class-equivalence relations provide rich supervisory signals for learning an embedding space. On standard benchmarks for metric learning, it clearly outperforms existing unsupervised learning methods and sometimes even beats supervised learning models using the same backbone network. It is also applied to semi-supervised metric learning as a way of exploiting additional unlabeled data, and achieves the state of the art by boosting performance of supervised learning substantially.
    Provably Confidential Language Modelling. (arXiv:2205.01863v1 [cs.CL])
    Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter these privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.
    Differentiable Simulation of Soft Multi-body Systems. (arXiv:2205.01758v1 [cs.LG])
    We present a method for differentiable simulation of soft articulated bodies. Our work enables the integration of differentiable physical dynamics into gradient-based pipelines. We develop a top-down matrix assembly algorithm within Projective Dynamics and derive a generalized dry friction model for soft continuum using a new matrix splitting strategy. We derive a differentiable control framework for soft articulated bodies driven by muscles, joint torques, or pneumatic tubes. The experiments demonstrate that our designs make soft body simulation more stable and realistic compared to other frameworks. Our method accelerates the solution of system identification problems by more than an order of magnitude, and enables efficient gradient-based learning of motion control with soft robots.
    DeeptDCS: Deep Learning-Based Estimation of Currents Induced During Transcranial Direct Current Stimulation. (arXiv:2205.01858v1 [q-bio.QM])
    Objective: Transcranial direct current stimulation (tDCS) is a non-invasive brain stimulation technique used to generate conduction currents in the head and disrupt brain functions. To rapidly evaluate the tDCS-induced current density in near real-time, this paper proposes a deep learning-based emulator, named DeeptDCS. Methods: The emulator leverages Attention U-net taking the volume conductor models (VCMs) of head tissues as inputs and outputting the three-dimensional current density distribution across the entire head. The electrode configurations are also incorporated into VCMs without increasing the number of input channels; this enables the straightforward incorporation of the non-parametric features of electrodes (e.g., thickness, shape, size, and position) in the training and testing of the proposed emulator. Results: Attention U-net outperforms standard U-net and its other three variants (Residual U-net, Attention Residual U-net, and Multi-scale Residual U-net) in terms of accuracy. The generalization ability of DeeptDCS to non-trained electrode positions can be greatly enhanced through fine-tuning the model. The computational time required by one emulation via DeeptDCS is a fraction of a second. Conclusion: DeeptDCS is at least two orders of magnitudes faster than a physics-based open-source simulator, while providing satisfactorily accurate results. Significance: The high computational efficiency permits the use of DeeptDCS in applications requiring its repetitive execution, such as uncertainty quantification and optimization studies of tDCS.
    Uncertainty estimation of pedestrian future trajectory using Bayesian approximation. (arXiv:2205.01887v1 [cs.LG])
    Past research on pedestrian trajectory forecasting mainly focused on deterministic predictions which provide only point estimates of future states. These future estimates can help an autonomous vehicle plan its trajectory and avoid collision. However, under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy. Rather, estimating the uncertainty associated with the predicted states with a certain level of confidence can lead to robust path planning. Hence, the authors propose to quantify this uncertainty during forecasting using stochastic approximation which deterministic approaches fail to capture. The current method is simple and applies Bayesian approximation during inference to standard neural network architectures for estimating uncertainty. The authors compared the predictions between the probabilistic neural network (NN) models with the standard deterministic models. The results indicate that the mean predicted path of probabilistic models was closer to the ground truth when compared with the deterministic prediction. Further, the effect of stochastic dropout of weights and long-term prediction on future state uncertainty has been studied. It was found that the probabilistic models produced better performance metrics like average displacement error (ADE) and final displacement error (FDE). Finally, the study has been extended to multiple datasets providing a comprehensive comparison for each model.
    Crystal Twins: Self-supervised Learning for Crystalline Material Property Prediction. (arXiv:2205.01893v1 [cs.LG])
    Machine learning (ML) models have been widely successful in the prediction of material properties. However, large labeled datasets required for training accurate ML models are elusive and computationally expensive to generate. Recent advances in Self-Supervised Learning (SSL) frameworks capable of training ML models on unlabeled data have mitigated this problem and demonstrated superior performance in computer vision and natural language processing tasks. Drawing inspiration from the developments in SSL, we introduce Crystal Twins (CT): an SSL method for crystalline materials property prediction. Using a large unlabeled dataset, we pre-train a Graph Neural Network (GNN) by applying the redundancy reduction principle to the graph latent embeddings of augmented instances obtained from the same crystalline system. By sharing the pre-trained weights when fine-tuning the GNN for regression tasks, we significantly improve the performance for 7 challenging material property prediction benchmarks
    Branch & Learn for Recursively and Iteratively Solvable Problems in Predict+Optimize. (arXiv:2205.01672v1 [cs.LG])
    This paper proposes Branch & Learn, a framework for Predict+Optimize to tackle optimization problems containing parameters that are unknown at the time of solving. Given an optimization problem solvable by a recursive algorithm satisfying simple conditions, we show how a corresponding learning algorithm can be constructed directly and methodically from the recursive algorithm. Our framework applies also to iterative algorithms by viewing them as a degenerate form of recursion. Extensive experimentation shows better performance for our proposal over classical and state-of-the-art approaches.
    Deep Sequence Modeling for Anomalous ISP Traffic Prediction. (arXiv:2205.01685v1 [cs.LG])
    Internet traffic in the real world is susceptible to various external and internal factors which may abruptly change the normal traffic flow. Those unexpected changes are considered outliers in traffic. However, deep sequence models have been used to predict complex IP traffic, but their comparative performance for anomalous traffic has not been studied extensively. In this paper, we investigated and evaluated the performance of different deep sequence models for anomalous traffic prediction. Several deep sequences models were implemented to predict real traffic without and with outliers and show the significance of outlier detection in real-world traffic prediction. First, two different outlier detection techniques, such as the Three-Sigma rule and Isolation Forest, were applied to identify the anomaly. Second, we adjusted those abnormal data points using the Backward Filling technique before training the model. Finally, the performance of different models was compared for abnormal and adjusted traffic. LSTM_Encoder_Decoder (LSTM_En_De) is the best prediction model in our experiment, reducing the deviation between actual and predicted traffic by more than 11\% after adjusting the outliers. All other models, including Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), LSTM_En_De with Attention layer (LSTM_En_De_Atn), Gated Recurrent Unit (GRU), show better prediction after replacing the outliers and decreasing prediction error by more than 29%, 24%, 19%, and 10% respectively. Our experimental results indicate that the outliers in the data can significantly impact the quality of the prediction. Thus, outlier detection and mitigation assist the deep sequence model in learning the general trend and making better predictions.
    Frequency Domain-Based Detection of Generated Audio. (arXiv:2205.01806v1 [cs.SD])
    Attackers may manipulate audio with the intent of presenting falsified reports, changing an opinion of a public figure, and winning influence and power. The prevalence of inauthentic multimedia continues to rise, so it is imperative to develop a set of tools that determines the legitimacy of media. We present a method that analyzes audio signals to determine whether they contain real human voices or fake human voices (i.e., voices generated by neural acoustic and waveform models). Instead of analyzing the audio signals directly, the proposed approach converts the audio signals into spectrogram images displaying frequency, intensity, and temporal content and evaluates them with a Convolutional Neural Network (CNN). Trained on both genuine human voice signals and synthesized voice signals, we show our approach achieves high accuracy on this classification task.
    Spatial-Temporal Meta-path Guided Explainable Crime Prediction. (arXiv:2205.01901v1 [cs.LG])
    Exposure to crime and violence can harm individuals' quality of life and the economic growth of communities. In light of the rapid development in machine learning, there is a rise in the need to explore automated solutions to prevent crimes. With the increasing availability of both fine-grained urban and public service data, there is a recent surge in fusing such cross-domain information to facilitate crime prediction. By capturing the information about social structure, environment, and crime trends, existing machine learning predictive models have explored the dynamic crime patterns from different views. However, these approaches mostly convert such multi-source knowledge into implicit and latent representations (e.g., learned embeddings of districts), making it still a challenge to investigate the impacts of explicit factors for the occurrences of crimes behind the scenes. In this paper, we present a Spatial-Temporal Metapath guided Explainable Crime prediction (STMEC) framework to capture dynamic patterns of crime behaviours and explicitly characterize how the environmental and social factors mutually interact to produce the forecasts. Extensive experiments show the superiority of STMEC compared with other advanced spatiotemporal models, especially in predicting felonies (e.g., robberies and assaults with dangerous weapons).
    ASTROMER: A transformer-based embedding for the representation of light curves. (arXiv:2205.01677v1 [astro-ph.IM])
    Taking inspiration from natural language embeddings, we present ASTROMER, a transformer-based model to create representations of light curves. ASTROMER was trained on millions of MACHO R-band samples, and it can be easily fine-tuned to match specific domains associated with downstream tasks. As an example, this paper shows the benefits of using pre-trained representations to classify variable stars. In addition, we provide a python library including all functionalities employed in this work. Our library includes the pre-trained models that can be used to enhance the performance of deep learning models, decreasing computational resources while achieving state-of-the-art results.
    Don't sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks. (arXiv:2205.01714v1 [cs.CL])
    Deep learning (DL) is being used extensively for text classification. However, researchers have demonstrated the vulnerability of such classifiers to adversarial attacks. Attackers modify the text in a way which misleads the classifier while keeping the original meaning close to intact. State-of-the-art (SOTA) attack algorithms follow the general principle of making minimal changes to the text so as to not jeopardize semantics. Taking advantage of this we propose a novel and intuitive defense strategy called Sample Shielding. It is attacker and classifier agnostic, does not require any reconfiguration of the classifier or external resources and is simple to implement. Essentially, we sample subsets of the input text, classify them and summarize these into a final decision. We shield three popular DL text classifiers with Sample Shielding, test their resilience against four SOTA attackers across three datasets in a realistic threat setting. Even when given the advantage of knowing about our shielding strategy the adversary's attack success rate is <=10% with only one exception and often < 5%. Additionally, Sample Shielding maintains near original accuracy when applied to original texts. Crucially, we show that the `make minimal changes' approach of SOTA attackers leads to critical vulnerabilities that can be defended against with an intuitive sampling strategy.
    The ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts. (arXiv:2205.01780v1 [eess.AS])
    The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, includes three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts. This paper describes the three tracks and provides performance measures for baseline models using state-of-the-art machine learning strategies. The baseline for each track is as follows, for ExVo-MultiTask, a combined score, computing the harmonic mean of Concordance Correlation Coefficient (CCC), Unweighted Average Recall (UAR), and inverted Mean Absolute Error (MAE) ($S_{MTL}$) is at best, 0.335 $S_{MTL}$; for ExVo-Generate, we report Fr\'echet inception distance (FID) scores ranging from 4.81 to 8.27 (depending on the emotion) between the training set and generated samples. We then combine the inverted FID with perceptual ratings of the generated samples ($S_{Gen}$) and obtain 0.174 $S_{Gen}$; and for ExVo-FewShot, a mean CCC of 0.444 is obtained.
    Learning Abstract and Transferable Representations for Planning. (arXiv:2205.02092v1 [cs.LG])
    We are concerned with the question of how an agent can acquire its own representations from sensory data. We restrict our focus to learning representations for long-term planning, a class of problems that state-of-the-art learning methods are unable to solve. We propose a framework for autonomously learning state abstractions of an agent's environment, given a set of skills. Importantly, these abstractions are task-independent, and so can be reused to solve new tasks. We demonstrate how an agent can use an existing set of options to acquire representations from ego- and object-centric observations. These abstractions can immediately be reused by the same agent in new environments. We show how to combine these portable representations with problem-specific ones to generate a sound description of a specific task that can be used for abstract planning. Finally, we show how to autonomously construct a multi-level hierarchy consisting of increasingly abstract representations. Since these hierarchies are transferable, higher-order concepts can be reused in new tasks, relieving the agent from relearning them and improving sample efficiency. Our results demonstrate that our approach allows an agent to transfer previous knowledge to new tasks, improving sample efficiency as the number of tasks increases.
    MemSE: Fast MSE Prediction for Noisy Memristor-Based DNN Accelerators. (arXiv:2205.01707v1 [cs.LG])
    Memristors enable the computation of matrix-vector multiplications (MVM) in memory and, therefore, show great potential in highly increasing the energy efficiency of deep neural network (DNN) inference accelerators. However, computations in memristors suffer from hardware non-idealities and are subject to different sources of noise that may negatively impact system performance. In this work, we theoretically analyze the mean squared error of DNNs that use memristor crossbars to compute MVM. We take into account both the quantization noise, due to the necessity of reducing the DNN model size, and the programming noise, stemming from the variability during the programming of the memristance value. Simulations on pre-trained DNN models showcase the accuracy of the analytical prediction. Furthermore the proposed method is almost two order of magnitude faster than Monte-Carlo simulation, thus making it possible to optimize the implementation parameters to achieve minimal error for a given power constraint.
    FastMapSVM: Classifying Complex Objects Using the FastMap Algorithm and Support-Vector Machines. (arXiv:2204.05112v2 [cs.CV] UPDATED)
    Neural Networks and related Deep Learning methods are currently at the leading edge of technologies used for classifying objects. However, they generally demand large amounts of time and data for model training; and their learned models can sometimes be difficult to interpret. In this paper, we re-introduce FastMapSVM, an interpretable Machine Learning framework for classifying complex objects. FastMapSVM combines the strengths of FastMap and Support-Vector Machines. FastMap is an efficient linear-time algorithm that maps complex objects to points in a Euclidean space, while preserving pairwise non-Euclidean distances between them. We demonstrate the efficiency and effectiveness of FastMapSVM in the context of classifying seismograms. We show that its performance, in terms of precision, recall, and accuracy, is comparable to that of other state-of-the-art methods. However, compared to other methods, FastMapSVM uses significantly smaller amounts of time and data for model training. It also provides a perspicuous visualization of the objects and the classification boundaries between them. We expect FastMapSVM to be viable for classification tasks in many other real-world domains.
    SVTS: Scalable Video-to-Speech Synthesis. (arXiv:2205.02058v1 [cs.SD])
    Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio. This task has received an increasing amount of attention due to its self-supervised nature (i.e., can be trained without manual labelling) combined with the ever-growing collection of audio-visual data available online. Despite these strong motivations, contemporary video-to-speech works focus mainly on small- to medium-sized corpora with substantial constraints in both vocabulary and setting. In this work, we introduce a scalable video-to-speech framework consisting of two components: a video-to-spectrogram predictor and a pre-trained neural vocoder, which converts the mel-frequency spectrograms into waveform audio. We achieve state-of-the art results for GRID and considerably outperform previous approaches on LRW. More importantly, by focusing on spectrogram prediction using a simple feedforward model, we can efficiently and effectively scale our method to very large and unconstrained datasets: To the best of our knowledge, we are the first to show intelligible results on the challenging LRS3 dataset.
    Can Rationalization Improve Robustness?. (arXiv:2204.11790v2 [cs.CL] UPDATED)
    A growing line of work has investigated the development of neural NLP models that can produce rationales--subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale models across five different tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios--when the rationalizer is sensitive to positional bias or lexical choices of attack text. Further, leveraging human rationale as supervision does not always translate to better performance. Our study is a first step towards exploring the interplay between interpretability and robustness in the rationalize-then-predict framework.
    Probabilistic Symmetry for Improved Trajectory Forecasting. (arXiv:2205.01927v1 [cs.LG])
    Trajectory prediction is a core AI problem with broad applications in robotics and autonomous driving. While most existing works focus on deterministic prediction, producing probabilistic forecasts to quantify prediction uncertainty is critical for downstream decision-making tasks such as risk assessment, motion planning, and safety guarantees. We introduce a new metric, mean regional score (MRS), to evaluate the quality of probabilistic trajectory forecasts. We propose a novel probabilistic trajectory prediction model, Probabilistic Equivariant Continuous COnvolution (PECCO) and show that leveraging symmetry, specifically rotation equivariance, can improve the predictions' accuracy as well as coverage. On both vehicle and pedestrian datasets, PECCO shows state-of-the-art prediction performance and improved calibration compared to baselines.
    A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting. (arXiv:2201.00008v3 [cs.LG] UPDATED)
    We study the forecasting problem for traffic with dynamic, possibly periodical, and joint spatial-temporal dependency between regions. Given the aggregated inflow and outflow traffic of regions in a city from time slots 0 to t-1, we predict the traffic at time t at any region. Prior arts in the area often consider the spatial and temporal dependencies in a decoupled manner or are rather computationally intensive in training with a large number of hyper-parameters to tune. We propose ST-TIS, a novel, lightweight, and accurate Spatial-Temporal Transformer with information fusion and region sampling for traffic forecasting. ST-TIS extends the canonical Transformer with information fusion and region sampling. The information fusion module captures the complex spatial-temporal dependency between regions. The region sampling module is to improve the efficiency and prediction accuracy, cutting the computation complexity for dependency learning from $O(n^2)$ to $O(n\sqrt{n})$, where n is the number of regions. With far fewer parameters than state-of-the-art models, the offline training of our model is significantly faster in terms of tuning and computation (with a reduction of up to $90\%$ on training time and network parameters). Notwithstanding such training efficiency, extensive experiments show that ST-TIS is substantially more accurate in online prediction than state-of-the-art approaches (with an average improvement of up to $9.5\%$ on RMSE, and $12.4\%$ on MAPE).
    Root-aligned SMILES: A Tight Representation for Chemical Reaction Prediction. (arXiv:2203.11444v3 [cs.LG] UPDATED)
    Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.
    Optimizing Mixture of Experts using Dynamic Recompilations. (arXiv:2205.01848v1 [cs.LG])
    The Mixture of Experts architecture allows for outrageously large neural networks by scaling model parameter size independently from computational demand (FLOPs). However, current DNN frameworks cannot effectively support the dynamic data flow in Mixture of Experts, and implementations on top of these frameworks need to use workarounds that introduce significant overheads. To address the limitation of these frameworks, we present DynaMoE, a DNN library that uses dynamic recompilations to optimize and adapt the use of computational resources to the dynamic needs of Mixture of Experts models. Our evaluation shows that DynaMoE achieves a 1.8x speedup and supports 2.3x larger model sizes when compared to existing MoE systems, even when not using recompilations. We then present further optimizations enabled by dynamic recompilations that yield an additional 1.7x speedup while simultaneously reducing memory pressure and improving model quality.
    DIAS: A Domain-Independent Alife-Based Problem-Solving System. (arXiv:2203.06855v2 [cs.NE] UPDATED)
    A domain-independent problem-solving system based on principles of Artificial Life is introduced. In this system, DIAS, the input and output dimensions of the domain are laid out in a spatial medium. A population of actors, each seeing only part of this medium, solves problems collectively in it. The process is independent of the domain and can be implemented through different kinds of actors. Through a set of experiments on various problem domains, DIAS is shown able to solve problems with different dimensionality and complexity, to require no hyperparameter tuning for new problems, and to exhibit lifelong learning, i.e. adapt rapidly to run-time changes in the problem domain, and do it better than a standard non-collective approach. DIAS therefore demonstrates a role for Alife in building scalable, general, and adaptive problem-solving systems.
    Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem. (arXiv:2205.01954v1 [cs.CL])
    Word embeddings are one of the most fundamental technologies used in natural language processing. Existing word embeddings are high-dimensional and consume considerable computational resources. In this study, we propose WordTour, unsupervised one-dimensional word embeddings. To achieve the challenging goal, we propose a decomposition of the desiderata of word embeddings into two parts, completeness and soundness, and focus on soundness in this paper. Owing to the single dimensionality, WordTour is extremely efficient and provides a minimal means to handle word embeddings. We experimentally confirmed the effectiveness of the proposed method via user study and document classification.
    VICE: Variational Inference for Concept Embeddings. (arXiv:2205.00756v3 [cs.LG] UPDATED)
    In this paper, we introduce Variational Inference for Concept Embeddings (VICE), an approximate Bayesian method for learning object concept embeddings from human behavior in an odd-one-out triplet task. We use variational inference to obtain a sparse, non-negative solution with uncertainty estimates about each embedding value. We exploit these estimates to automatically select the dimensions that explain the data while yielding reproducible embeddings. We introduce a PAC learning bound for VICE that can be used to estimate generalization performance or determine a sufficient sample size for different experimental designs. VICE rivals or outperforms its predecessor, SPoSE, at predicting human behavior in a triplet task. VICE object representations are substantially more reproducible and consistent across different random initializations.
    The scope for AI-augmented interpretation of building blueprints in commercial and industrial property insurance. (arXiv:2205.01671v1 [cs.CV])
    This report, commissioned by the WTW research network, investigates the use of AI in property risk assessment. It (i) reviews existing work on risk assessment in commercial and industrial properties and automated information extraction from building blueprints; and (ii) presents an exploratory 'proof-of concept-solution' exploring the feasibility of using machine learning for the automated extraction of information from building blueprints to support insurance risk assessment.
    Finding patterns in Knowledge Attribution for Transformers. (arXiv:2205.01366v2 [cs.CL] UPDATED)
    We analyze the Knowledge Neurons framework for the attribution of factual and relational knowledge to particular neurons in the transformer network. We use a 12-layer multi-lingual BERT model for our experiments. Our study reveals various interesting phenomena. We observe that mostly factual knowledge can be attributed to middle and higher layers of the network($\ge 6$). Further analysis reveals that the middle layers($6-9$) are mostly responsible for relational information, which is further refined into actual factual knowledge or the "correct answer" in the last few layers($10-12$). Our experiments also show that the model handles prompts in different languages, but representing the same fact, similarly, providing further evidence for effectiveness of multi-lingual pre-training. Applying the attribution scheme for grammatical knowledge, we find that grammatical knowledge is far more dispersed among the neurons than factual knowledge.
    Cross-Loss Influence Functions to Explain Deep Network Representations. (arXiv:2012.01685v2 [cs.LG] UPDATED)
    As machine learning is increasingly deployed in the real world, it is paramount that we develop the tools necessary to analyze the decision-making of the models we train and deploy to end-users. Recently, researchers have shown that influence functions, a statistical measure of sample impact, can approximate the effects of training samples on classification accuracy for deep neural networks. However, this prior work only applies to supervised learning, where training and testing share an objective function. No approaches currently exist for estimating the influence of unsupervised training examples for deep learning models. To bring explainability to unsupervised and semi-supervised training regimes, we derive the first theoretical and empirical demonstration that influence functions can be extended to handle mismatched training and testing (i.e., "cross-loss") settings. Our formulation enables us to compute the influence in an unsupervised learning setup, explain cluster memberships, and identify and augment biases in language models. Our experiments show that our cross-loss influence estimates even exceed matched-objective influence estimation relative to ground-truth sample impact.
    Processing Network Controls via Deep Reinforcement Learning. (arXiv:2205.02119v1 [math.OC])
    Novel advanced policy gradient (APG) algorithms, such as proximal policy optimization (PPO), trust region policy optimization, and their variations, have become the dominant reinforcement learning (RL) algorithms because of their ease of implementation and good practical performance. This dissertation is concerned with theoretical justification and practical application of the APG algorithms for solving processing network control optimization problems. Processing network control problems are typically formulated as Markov decision process (MDP) or semi-Markov decision process (SMDP) problems that have several unconventional for RL features: infinite state spaces, unbounded costs, long-run average cost objectives. Policy improvement bounds play a crucial role in the theoretical justification of the APG algorithms. In this thesis we refine existing bounds for MDPs with finite state spaces and prove novel policy improvement bounds for classes of MDPs and SMDPs used to model processing network operations. We consider two examples of processing network control problems and customize the PPO algorithm to solve them. First, we consider parallel-server and multiclass queueing networks controls. Second, we consider the drivers repositioning problem in a ride-hailing service system. For both examples the PPO algorithm with auxiliary modifications consistently generates control policies that outperform state-of-art heuristics.
    Accelerating Inhibitor Discovery for Multiple SARS-CoV-2 Targets with a Single, Sequence-Guided Deep Generative Framework. (arXiv:2204.09042v2 [q-bio.QM] UPDATED)
    The COVID-19 pandemic has highlighted the urgency for developing more efficient molecular discovery pathways. As exhaustive exploration of the vast chemical space is infeasible, discovering novel inhibitor molecules for emerging drug-target proteins is challenging, particularly for targets with unknown structure or ligands. We demonstrate the broad utility of a single deep generative framework toward discovering novel drug-like inhibitor molecules against two distinct SARS-CoV-2 targets -- the main protease (Mpro) and the receptor binding domain (RBD) of the spike protein. To perform target-aware design, the framework employs a target sequence-conditioned sampling of novel molecules from a generative model. Micromolar-level in vitro inhibition was observed for two candidates (out of four synthesized) for each target. The most potent spike RBD inhibitor also emerged as a rare non-covalent antiviral with broad-spectrum activity against several SARS-CoV-2 variants in live virus neutralization assays. These results show that a broadly deployable machine intelligence framework can accelerate hit discovery across different emerging drug-targets.
    An improved central limit theorem and fast convergence rates for entropic transportation costs. (arXiv:2204.09105v2 [math.ST] UPDATED)
    We prove a central limit theorem for the entropic transportation cost between subgaussian probability measures, centered at the population cost. This is the first result which allows for asymptotically valid inference for entropic optimal transport between measures which are not necessarily discrete. In the compactly supported case, we complement these results with new, faster, convergence rates for the expected entropic transportation cost between empirical measures. Our proof is based on strengthening convergence results for dual solutions to the entropic optimal transport problem.
    Prediction of fish location by combining fisheries data and sea bottom temperature forecasting. (arXiv:2205.02107v1 [cs.CV])
    This paper combines fisheries dependent data and environmental data to be used in a machine learning pipeline to predict the spatio-temporal abundance of two species (plaice and sole) commonly caught by the Belgian fishery in the North Sea. By combining fisheries related features with environmental data, sea bottom temperature derived from remote sensing, a higher accuracy can be achieved. In a forecast setting, the predictive accuracy is further improved by predicting, using a recurrent deep neural network, the sea bottom temperature up to four days in advance instead of relying on the last previous temperature measurement.
    Predicting vacant parking space availability zone-wisely: a graph based spatio-temporal prediction approach. (arXiv:2205.02113v1 [cs.LG])
    Vacant parking space (VPS) prediction is one of the key issues of intelligent parking guidance systems. Accurately predicting VPS information plays a crucial role in intelligent parking guidance systems, which can help drivers find parking space quickly, reducing unnecessary waste of time and excessive environmental pollution. Through the simple analysis of historical data, we found that there not only exists a obvious temporal correlation in each parking lot, but also a clear spatial correlation between different parking lots. In view of this, this paper proposed a graph data-based model ST-GBGRU (Spatial-Temporal Graph Based Gated Recurrent Unit), the number of VPSs can be predicted both in short-term (i.e., within 30 min) and in long-term (i.e., over 30min). On the one hand, the temporal correlation of historical VPS data is extracted by GRU, on the other hand, the spatial correlation of historical VPS data is extracted by GCN inside GRU. Two prediction methods, namely direct prediction and iterative prediction, are combined with the proposed model. Finally, the prediction model is applied to predict the number VPSs of 8 public parking lots in Santa Monica. The results show that in the short-term and long-term prediction tasks, ST-GBGRU model can achieve high accuracy and have good application prospects.
    Domino Saliency Metrics: Improving Existing Channel Saliency Metrics with Structural Information. (arXiv:2205.02131v1 [cs.CV])
    Channel pruning is used to reduce the number of weights in a Convolutional Neural Network (CNN). Channel pruning removes slices of the weight tensor so that the convolution layer remains dense. The removal of these weight slices from a single layer causes mismatching number of feature maps between layers of the network. A simple solution is to force the number of feature map between layers to match through the removal of weight slices from subsequent layers. This additional constraint becomes more apparent in DNNs with branches where multiple channels need to be pruned together to keep the network dense. Popular pruning saliency metrics do not factor in the structural dependencies that arise in DNNs with branches. We propose Domino metrics (built on existing channel saliency metrics) to reflect these structural constraints. We test Domino saliency metrics against the baseline channel saliency metrics on multiple networks with branches. Domino saliency metrics improved pruning rates in most tested networks and up to 25% in AlexNet on CIFAR-10.
    A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers. (arXiv:2110.15105v3 [cs.LG] UPDATED)
    In this paper, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator} to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP). Grounded in \textsl{Policy Space Response Oracle} (PSRO) methods, our two-player framework outputs a population of best-responding Solvers, over which we can mix and output a combined model that achieves the least exploitability against the Generator, and thereby the most generalizable performance on different TSP tasks. We conduct experiments on a variety of TSP instances with different types and sizes. Results suggest that our Solvers achieve the state-of-the-art performance even on tasks the Solver never meets, whilst the performance of other deep learning-based Solvers drops sharply due to over-fitting. To demonstrate the principle of our framework, we study the learning outcome of the proposed two-player game and demonstrate that the exploitability of the Solver population decreases during training, and it eventually approximates the Nash equilibrium along with the Generator.
    Modelling calibration uncertainty in networks of environmental sensors. (arXiv:2205.01988v1 [cs.LG])
    Networks of low-cost sensors are becoming ubiquitous, but often suffer from low accuracies and drift. Regular colocation with reference sensors allows recalibration but is often complicated and expensive. Alternatively the calibration can be transferred using low-cost, mobile sensors, often at very low cost. However inferring appropriate estimates of the calibration functions (with uncertainty) for the network of sensors becomes difficult, especially as the network of visits by the mobile, low-cost sensors becomes large. We propose a variational approach to model the calibration across the network of sensors. We demonstrate the approach on both synthetic and real air pollution data, and find it can perform better than the state of the art (multi-hop calibration). We extend it to categorical data, combining classifications of insects by non-expert citizen scientists. Achieving uncertainty-quantified calibration has been one of the major barriers to low-cost sensor deployment and citizen-science research. We hope that the methods described will enable such projects.
    On Circuit Depth Scaling For Quantum Approximate Optimization. (arXiv:2205.01698v1 [quant-ph])
    Variational quantum algorithms are the centerpiece of modern quantum programming. These algorithms involve training parameterized quantum circuits using a classical co-processor, an approach adapted partly from classical machine learning. An important subclass of these algorithms, designed for combinatorial optimization on currrent quantum hardware, is the quantum approximate optimization algorithm (QAOA). It is known that problem density - a problem constraint to variable ratio - induces under-parametrization in fixed depth QAOA. Density dependent performance has been reported in the literature, yet the circuit depth required to achieve fixed performance (henceforth called critical depth) remained unknown. Here, we propose a predictive model, based on a logistic saturation conjecture for critical depth scaling with respect to density. Focusing on random instances of MAX-2-SAT, we test our predictive model against simulated data with up to 15 qubits. We report the average critical depth, required to attain a success probability of 0.7, saturates at a value of 10 for densities beyond 4. We observe the predictive model to describe the simulated data within a $3\sigma$ confidence interval. Furthermore, based on the model, a linear trend for the critical depth with respect problem size is recovered for the range of 5 to 15 qubits.
    EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis. (arXiv:2205.01996v1 [cs.CL])
    We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the other hand, a subset of the corpus complements dimensional VAD annotations with categorical ones based on Basic Emotions. We find evidence for the supremacy of the reader's perspective in terms of IAA and rating intensity, and achieve close-to-human performance when mapping between dimensional and categorical formats.
    A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks. (arXiv:2205.02043v1 [stat.ML])
    Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $n$ and the structure of the manifold with intrinsic dimension $d$. When an atlas is given, we propose two-step test to identify the difference between general distributions, which achieves the type-II risk in the order of $n^{-1/\max\{d,2\}}$. When an atlas is not given, we propose H\"older IPM test that applies for data distributions with $(s,\beta)$-H\"older densities, which achieves the type-II risk in the order of $n^{-(s+\beta)/d}$. To mitigate the heavy computation burden of evaluating the H\"older IPM, we approximate the H\"older function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta)/d}$, which is in the same order of the type-II risk as the H\"older IPM test. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
    FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization. (arXiv:2205.02215v1 [cs.LG])
    Standard federated optimization methods successfully apply to stochastic problems with \textit{single-level} structure. However, many contemporary ML problems -- including adversarial robustness, hyperparameter tuning, and actor-critic -- fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose FedNest: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for FedNest in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. FedNest introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter \& hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice. Code is available at https://github.com/mc-nya/FedNest.
    Generalized Multi-Output Gaussian Process Censored Regression. (arXiv:2009.04822v2 [stat.ML] UPDATED)
    When modelling censored observations, a typical approach in current regression methods is to use a censored-Gaussian (i.e. Tobit) model to describe the conditional output distribution. In this paper, as in the case of missing data, we argue that exploiting correlations between multiple outputs can enable models to better address the bias introduced by censored data. To do so, we introduce a heteroscedastic multi-output Gaussian process model which combines the non-parametric flexibility of GPs with the ability to leverage information from correlated outputs under input-dependent noise conditions. To address the resulting inference intractability, we further devise a variational bound to the marginal log-likelihood suitable for stochastic optimization. We empirically evaluate our model against other generative models for censored data on both synthetic and real world tasks and further show how it can be generalized to deal with arbitrary likelihood functions. Results show how the added flexibility allows our model to better estimate the underlying non-censored (i.e. true) process under potentially complex censoring dynamics.
    Balsa: Learning a Query Optimizer Without Expert Demonstrations. (arXiv:2201.01441v2 [cs.DB] UPDATED)
    Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Balsa first learns basic knowledge from a simple, environment-agnostic simulator, followed by safe learning in real execution. On the Join Order Benchmark, Balsa matches the performance of two expert query optimizers, both open-source and commercial, with two hours of learning, and outperforms them by up to 2.8$\times$ in workload runtime after a few more hours. Balsa thus opens the possibility of automatically learning to optimize in future compute environments where expert-designed optimizers do not exist.
    Policy Optimization Using Semi-parametric Models for Dynamic Pricing. (arXiv:2109.06368v2 [cs.LG] UPDATED)
    In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to Javanmard and Nazerzadeh [2019] except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision-making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. $F(\cdot)$ with $m$-th order derivative ($m\geq 2$), our policy achieves a regret upper bound of $\tilde{O}_{d}(T^{\frac{2m+1}{4m-1}})$, where $T$ is time horizon and $\tilde{O}_{d}$ is the order that hides logarithmic terms and the dimensionality of feature $d$. The upper bound is further reduced to $\tilde{O}_{d}(\sqrt{T})$ if $F$ is super smooth whose Fourier transform decays exponentially. In terms of dependence on the horizon $T$, these upper bounds are close to $\Omega(\sqrt{T})$, the lower bound where $F$ belongs to a parametric class. We further generalize these results to the case with dynamically dependent product features under the strong mixing condition.
    MAD: Self-Supervised Masked Anomaly Detection Task for Multivariate Time Series. (arXiv:2205.02100v1 [cs.LG])
    In this paper, we introduce Masked Anomaly Detection (MAD), a general self-supervised learning task for multivariate time series anomaly detection. With the increasing availability of sensor data from industrial systems, being able to detecting anomalies from streams of multivariate time series data is of significant importance. Given the scarcity of anomalies in real-world applications, the majority of literature has been focusing on modeling normality. The learned normal representations can empower anomaly detection as the model has learned to capture certain key underlying data regularities. A typical formulation is to learn a predictive model, i.e., use a window of time series data to predict future data values. In this paper, we propose an alternative self-supervised learning task. By randomly masking a portion of the inputs and training a model to estimate them using the remaining ones, MAD is an improvement over the traditional left-to-right next step prediction (NSP) task. Our experimental results demonstrate that MAD can achieve better anomaly detection rates over traditional NSP approaches when using exactly the same neural network (NN) base models, and can be modified to run as fast as NSP models during test time on the same hardware, thus making it an ideal upgrade for many existing NSP-based NN anomaly detection models.
    Accelerating phase-field-based simulation via machine learning. (arXiv:2205.02121v1 [cond-mat.mtrl-sci])
    Phase-field-based models have become common in material science, mechanics, physics, biology, chemistry, and engineering for the simulation of microstructure evolution. Yet, they suffer from the drawback of being computationally very costly when applied to large, complex systems. To reduce such computational costs, a Unet-based artificial neural network is developed as a surrogate model in the current work. Training input for this network is obtained from the results of the numerical solution of initial-boundary-value problems (IBVPs) based on the Fan-Chen model for grain microstructure evolution. In particular, about 250 different simulations with varying initial order parameters are carried out and 200 frames of the time evolution of the phase fields are stored for each simulation. The network is trained with 90% of this data, taking the $i$-th frame of a simulation, i.e. order parameter field, as input, and producing the $(i+1)$-th frame as the output. Evaluation of the network is carried out with a test dataset consisting of 2200 microstructures based on different configurations than originally used for training. The trained network is applied recursively on initial order parameters to calculate the time evolution of the phase fields. The results are compared to the ones obtained from the conventional numerical solution in terms of the errors in order parameters and the system's free energy. The resulting order parameter error averaged over all points and all simulation cases is 0.005 and the relative error in the total free energy in all simulation boxes does not exceed 1%.
    Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis. (arXiv:2205.01800v1 [cs.SD])
    Synthesized speech is common today due to the prevalence of virtual assistants, easy-to-use tools for generating and modifying speech signals, and remote work practices. Synthesized speech can also be used for nefarious purposes, including creating a purported speech signal and attributing it to someone who did not speak the content of the signal. We need methods to detect if a speech signal is synthesized. In this paper, we analyze speech signals in the form of spectrograms with a Compact Convolutional Transformer (CCT) for synthesized speech detection. A CCT utilizes a convolutional layer that introduces inductive biases and shared weights into a network, allowing a transformer architecture to perform well with fewer data samples used for training. The CCT uses an attention mechanism to incorporate information from all parts of a signal under analysis. Trained on both genuine human voice signals and synthesized human voice signals, we demonstrate that our CCT approach successfully differentiates between genuine and synthesized speech signals.
    Uncertainty-Autoencoder-Based Privacy and Utility Preserving Data Type Conscious Transformation. (arXiv:2205.01950v1 [cs.LG])
    We propose an adversarial learning framework that deals with the privacy-utility tradeoff problem under two types of conditions: data-type ignorant, and data-type aware. Under data-type aware conditions, the privacy mechanism provides a one-hot encoding of categorical features, representing exactly one class, while under data-type ignorant conditions the categorical variables are represented by a collection of scores, one for each class. We use a neural network architecture consisting of a generator and a discriminator, where the generator consists of an encoder-decoder pair, and the discriminator consists of an adversary and a utility provider. Unlike previous research considering this kind of architecture, which leverages autoencoders (AEs) without introducing any randomness, or variational autoencoders (VAEs) based on learning latent representations which are then forced into a Gaussian assumption, our proposed technique introduces randomness and removes the Gaussian assumption restriction on the latent variables, only focusing on the end-to-end stochastic mapping of the input to privatized data. We test our framework on different datasets: MNIST, FashionMNIST, UCI Adult, and US Census Demographic Data, providing a wide range of possible private and utility attributes. We use multiple adversaries simultaneously to test our privacy mechanism -- some trained from the ground truth data and some trained from the perturbed data generated by our privacy mechanism. Through comparative analysis, our results demonstrate better privacy and utility guarantees than the existing works under similar, data-type ignorant conditions, even when the latter are considered under their original restrictive single-adversary model.
    Do More Negative Samples Necessarily Hurt in Contrastive Learning?. (arXiv:2205.01789v1 [cs.LG])
    Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more "negative samples" in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a "collision-coverage" trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.
    Generalized Knowledge Distillation via Relationship Matching. (arXiv:2205.01915v1 [cs.CV])
    The knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks. Knowledge distillation extracts knowledge from the teacher and integrates it with the target model (a.k.a. the "student"), which expands the student's knowledge and improves its learning efficacy. Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space -- in this "Generalized Knowledge Distillation (GKD)", the classes of the teacher and the student may be the same, completely different, or partially overlapped. We claim that the comparison ability between instances acts as an essential factor threading knowledge across tasks, and propose the RElationship FacIlitated Local cLassifiEr Distillation (REFILLED) approach, which decouples the GKD flow of the embedding and the top-layer classifier. In particular, different from reconciling the instance-label confidence between models, REFILLED requires the teacher to reweight the hard tuples pushed forward by the student and then matches the similarity comparison levels between instances. An embedding-induced classifier based on the teacher model supervises the student's classification confidence and adaptively emphasizes the most related supervision from the teacher. REFILLED demonstrates strong discriminative ability when the classes of the teacher vary from the same to a fully non-overlapped set w.r.t. the student. It also achieves state-of-the-art performance on standard knowledge distillation, one-step incremental learning, and few-shot learning tasks.
    Zero-Episode Few-Shot Contrastive Predictive Coding: Solving intelligence tests without prior training. (arXiv:2205.01924v1 [cs.CV])
    Video prediction models often combine three components: an encoder from pixel space to a small latent space, a latent space prediction model, and a generative model back to pixel space. However, the large and unpredictable pixel space makes training such models difficult, requiring many training examples. We argue that finding a predictive latent variable and using it to evaluate the consistency of a future image enables data-efficient predictions because it precludes the necessity of a generative model training. To demonstrate it, we created sequence completion intelligence tests in which the task is to identify a predictably changing feature in a sequence of images and use this prediction to select the subsequent image. We show that a one-dimensional Markov Contrastive Predictive Coding (M-CPC_1D) model solves these tests efficiently, with only five examples. Finally, we demonstrate the usefulness of M-CPC_1D in solving two tasks without prior training: anomaly detection and stochastic movement video prediction.
    A Deep Learning-based Integrated Framework for Quality-aware Undersampled Cine Cardiac MRI Reconstruction and Analysis. (arXiv:2205.01673v1 [eess.IV])
    Cine cardiac magnetic resonance (CMR) imaging is considered the gold standard for cardiac function evaluation. However, cine CMR acquisition is inherently slow and in recent decades considerable effort has been put into accelerating scan times without compromising image quality or the accuracy of derived results. In this paper, we present a fully-automated, quality-controlled integrated framework for reconstruction, segmentation and downstream analysis of undersampled cine CMR data. The framework enables active acquisition of radial k-space data, in which acquisition can be stopped as soon as acquired data are sufficient to produce high quality reconstructions and segmentations. This results in reduced scan times and automated analysis, enabling robust and accurate estimation of functional biomarkers. To demonstrate the feasibility of the proposed approach, we perform realistic simulations of radial k-space acquisitions on a dataset of subjects from the UK Biobank and present results on in-vivo cine CMR k-space data collected from healthy subjects. The results demonstrate that our method can produce quality-controlled images in a mean scan time reduced from 12 to 4 seconds per slice, and that image quality is sufficient to allow clinically relevant parameters to be automatically estimated to within 5% mean absolute difference.
    CoCa: Contrastive Captioners are Image-Text Foundation Models. (arXiv:2205.01917v1 [cs.CV])
    Exploring large-scale pretrained foundation models is of significant interest in computer vision because these models can be quickly transferred to many downstream tasks. This paper presents Contrastive Captioner (CoCa), a minimalist design to pretrain an image-text encoder-decoder foundation model jointly with contrastive loss and captioning loss, thereby subsuming model capabilities from contrastive approaches like CLIP and generative methods like SimVLM. In contrast to standard encoder-decoder transformers where all decoder layers attend to encoder outputs, CoCa omits cross-attention in the first half of decoder layers to encode unimodal text representations, and cascades the remaining decoder layers which cross-attend to the image encoder for multimodal image-text representations. We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively. By sharing the same computational graph, the two training objectives are computed efficiently with minimal overhead. CoCa is pretrained end-to-end and from scratch on both web-scale alt-text data and annotated images by treating all labels simply as text, seamlessly unifying natural language supervision for representation learning. Empirically, CoCa achieves state-of-the-art performance with zero-shot transfer or minimal task-specific adaptation on a broad range of downstream tasks, spanning visual recognition (ImageNet, Kinetics-400/600/700, Moments-in-Time), crossmodal retrieval (MSCOCO, Flickr30K, MSR-VTT), multimodal understanding (VQA, SNLI-VE, NLVR2), and image captioning (MSCOCO, NoCaps). Notably on ImageNet classification, CoCa obtains 86.3% zero-shot top-1 accuracy, 90.6% with a frozen encoder and learned classification head, and new state-of-the-art 91.0% top-1 accuracy on ImageNet with a finetuned encoder.
    Meta-Cognition. An Inverse-Inverse Reinforcement Learning Approach for Cognitive Radars. (arXiv:2205.01794v1 [eess.SP])
    This paper considers meta-cognitive radars in an adversarial setting. A cognitive radar optimally adapts its waveform (response) in response to maneuvers (probes) of a possibly adversarial moving target. A meta-cognitive radar is aware of the adversarial nature of the target and seeks to mitigate the adversarial target. How should the meta-cognitive radar choose its responses to sufficiently confuse the adversary trying to estimate the radar's utility function? This paper abstracts the radar's meta-cognition problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and embeds the algebraic Riccati equation into an economics-based utility maximization setup. This adversarial target is an inverse reinforcement learner. By observing a noisy sequence of radar's responses (waveforms), the adversarial target uses a statistical hypothesis test to detect if the radar is a utility maximizer. In turn, the meta-cognitive radar deliberately chooses sub-optimal responses that increasing its Type-I error probability of the adversary's detector. We call this counter-adversarial step taken by the meta-cognitive radar as inverse inverse reinforcement learning (I-IRL). We illustrate the meta-cognition results of this paper via simple numerical examples. Our approach for meta-cognition in this paper is based on revealed preference theory in micro-economics and inspired by results in differential privacy and adversarial obfuscation in machine learning.
    Second Order Path Variationals in Non-Stationary Online Learning. (arXiv:2205.01921v1 [cs.LG])
    We consider the problem of universal dynamic regret minimization under exp-concave and smooth losses. We show that appropriately designed Strongly Adaptive algorithms achieve a dynamic regret of $\tilde O(d^2 n^{1/5} C_n^{2/5} \vee d^2)$, where $n$ is the time horizon and $C_n$ a path variational based on second order differences of the comparator sequence. Such a path variational naturally encodes comparator sequences that are piecewise linear -- a powerful family that tracks a variety of non-stationarity patterns in practice (Kim et al, 2009). The aforementioned dynamic regret rate is shown to be optimal modulo dimension dependencies and poly-logarithmic factors of $n$. Our proof techniques rely on analysing the KKT conditions of the offline oracle and requires several non-trivial generalizations of the ideas in Baby and Wang, 2021, where the latter work only leads to a slower dynamic regret rate of $\tilde O(d^{2.5}n^{1/3}C_n^{2/3} \vee d^{2.5})$ for the current problem.
    A Comprehensive Survey and Taxonomy on Image Dehazing Based on Deep Learning. (arXiv:2106.03323v2 [cs.CV] UPDATED)
    With the development of convolutional neural networks, hundreds of deep learning based dehazing methods have been proposed. In this paper, we provide a comprehensive survey on supervised, semi-supervised, and unsupervised dehazing. We first discuss the physical model, datasets, network modules, loss functions, and evaluation metrics that are commonly used. Then, the main contributions of various dehazing algorithms are categorized and summarized. Further, quantitative and qualitative experiments of various baseline methods are carried out. Finally, the unsolved issues and challenges that can inspire the future research are pointed out. A collection of useful dehazing materials is available at https://github.com/Xiaofeng-life/AwesomeDehazing.
    Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving. (arXiv:2105.00373v4 [cs.RO] UPDATED)
    The past few years have witnessed an increasing interest in improving the perception performance of LiDARs on autonomous vehicles. While most of the existing works focus on developing new deep learning algorithms or model architectures, we study the problem from the physical design perspective, i.e., how different placements of multiple LiDARs influence the learning-based perception. To this end, we introduce an easy-to-compute information-theoretic surrogate metric to quantitatively and fast evaluate LiDAR placement for 3D detection of different types of objects. We also present a new data collection, detection model training and evaluation framework in the realistic CARLA simulator to evaluate disparate multi-LiDAR configurations. Using several prevalent placements inspired by the designs of self-driving companies, we show the correlation between our surrogate metric and object detection performance of different representative algorithms on KITTI through extensive experiments, validating the effectiveness of our LiDAR placement evaluation approach. Our results show that sensor placement is non-negligible in 3D point cloud-based object detection, which will contribute up to 10% performance discrepancy in terms of average precision in challenging 3D object detection settings. We believe that this is one of the first studies to quantitatively investigate the influence of LiDAR placement on perception performance. The code is available on https://github.com/HanjiangHu/Multi-LiDAR-Placement-for-3D-Detection.
    AmbiPun: Generating Humorous Puns with Ambiguous Context. (arXiv:2205.01825v1 [cs.CL])
    In this paper, we propose a simple yet effective way to generate pun sentences that does not require any training on existing puns. Our approach is inspired by humor theories that ambiguity comes from the context rather than the pun word itself. Given a pair of definitions of a pun word, our model first produces a list of related concepts through a reverse dictionary. We then utilize one-shot GPT3 to generate context words and then generate puns incorporating context words from both concepts. Human evaluation shows that our method successfully generates pun 52\% of the time, outperforming well-crafted baselines and the state-of-the-art models by a large margin.
  • Open

    Learning the temporal evolution of multivariate densities via normalizing flows. (arXiv:2107.13735v2 [stat.ML] UPDATED)
    In this work, we propose a method to learn multivariate probability distributions using sample path data from stochastic differential equations. Specifically, we consider temporally evolving probability distributions (e.g., those produced by integrating local or nonlocal Fokker-Planck equations). We analyze this evolution through machine learning assisted construction of a time-dependent mapping that takes a reference distribution (say, a Gaussian) to each and every instance of our evolving distribution. If the reference distribution is the initial condition of a Fokker-Planck equation, what we learn is the time-T map of the corresponding solution. Specifically, the learned map is a multivariate normalizing flow that deforms the support of the reference density to the support of each and every density snapshot in time. We demonstrate that this approach can approximate probability density function evolutions in time from observed sampled data for systems driven by both Brownian and L\'evy noise. We present examples with two- and three-dimensional, uni- and multimodal distributions to validate the method.  ( 2 min )
    Two Stage Curvature Identification with Machine Learning: Causal Inference with Possibly Invalid Instrumental Variables. (arXiv:2203.12808v2 [stat.ME] UPDATED)
    Instrumental variables regression is a popular causal inference method for endogenous treatment. A significant concern in practical applications is the validity and strength of instrumental variables. This paper aims to perform causal inference when all instruments are possibly invalid. To do this, we propose a novel methodology called two stage curvature identification (TSCI) together with a generalized concept to measure the strengths of possibly invalid instruments: such invalid instruments can still be used for inference in our framework. We fit the treatment model with a general machine learning method and propose a novel bias correction method to remove the overfitting bias from machine learning methods. Among a collection of spaces of violation functions, we choose the best one by evaluating invalid instrumental variables' strength. We demonstrate our proposed TSCI methodology in a large-scale simulation study and revisit the important economics question on the effect of education on earnings.
    Second Order Path Variationals in Non-Stationary Online Learning. (arXiv:2205.01921v1 [cs.LG])
    We consider the problem of universal dynamic regret minimization under exp-concave and smooth losses. We show that appropriately designed Strongly Adaptive algorithms achieve a dynamic regret of $\tilde O(d^2 n^{1/5} C_n^{2/5} \vee d^2)$, where $n$ is the time horizon and $C_n$ a path variational based on second order differences of the comparator sequence. Such a path variational naturally encodes comparator sequences that are piecewise linear -- a powerful family that tracks a variety of non-stationarity patterns in practice (Kim et al, 2009). The aforementioned dynamic regret rate is shown to be optimal modulo dimension dependencies and poly-logarithmic factors of $n$. Our proof techniques rely on analysing the KKT conditions of the offline oracle and requires several non-trivial generalizations of the ideas in Baby and Wang, 2021, where the latter work only leads to a slower dynamic regret rate of $\tilde O(d^{2.5}n^{1/3}C_n^{2/3} \vee d^{2.5})$ for the current problem.
    Making SGD Parameter-Free. (arXiv:2205.02160v1 [math.OC])
    We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.
    Multiple Testing and Variable Selection along the path of the Least Angle Regression. (arXiv:1906.12072v5 [math.ST] UPDATED)
    We investigate multiple testing and variable selection using the Least Angle Regression (LARS) algorithm in high dimensions under the assumption of Gaussian noise. LARS is known to produce a piecewise affine solution path with change points referred to as the knots of the LARS path. The key to our results is an expression in closed form of the exact joint law of a $K$-tuple of knots conditional on the variables selected by LARS, namely the so-called post-selection joint law of the LARS knots. Numerical experiments demonstrate the perfect fit of our findings. This paper makes three main contributions. First, we build testing procedures on variables entering the model along the LARS path in the general design case when the noise level can be unknown. These testing procedures are referred to as the Generalized $t$-Spacing tests (GtSt) and we prove that they have an exact non-asymptotic level (i.e., the Type I error is exactly controlled). This extends work of (Taylor et al., 2014) where the spacing test works for consecutive knots and known variance. Second, we introduce a new exact multiple false negatives test after model selection in the general design case when the noise level may be unknown. We prove that this testing procedure has exact non-asymptotic level for general design and unknown noise level. Third, we give an exact control of the false discovery rate under orthogonal design assumption. Monte Carlo simulations and a real data experiment are provided to illustrate our results in this case. Of independent interest, we introduce an equivalent formulation of the LARS algorithm based on a recursive function.
    VICE: Variational Inference for Concept Embeddings. (arXiv:2205.00756v3 [cs.LG] UPDATED)
    In this paper, we introduce Variational Inference for Concept Embeddings (VICE), an approximate Bayesian method for learning object concept embeddings from human behavior in an odd-one-out triplet task. We use variational inference to obtain a sparse, non-negative solution with uncertainty estimates about each embedding value. We exploit these estimates to automatically select the dimensions that explain the data while yielding reproducible embeddings. We introduce a PAC learning bound for VICE that can be used to estimate generalization performance or determine a sufficient sample size for different experimental designs. VICE rivals or outperforms its predecessor, SPoSE, at predicting human behavior in a triplet task. VICE object representations are substantially more reproducible and consistent across different random initializations.
    Depth Uncertainty Networks for Active Learning. (arXiv:2112.06796v2 [cs.LG] UPDATED)
    In active learning, the size and complexity of the training dataset changes over time. Simple models that are well specified by the amount of data available at the start of active learning might suffer from bias as more points are actively sampled. Flexible models that might be well suited to the full dataset can suffer from overfitting towards the start of active learning. We tackle this problem using Depth Uncertainty Networks (DUNs), a BNN variant in which the depth of the network, and thus its complexity, is inferred. We find that DUNs outperform other BNN variants on several active learning tasks. Importantly, we show that on the tasks in which DUNs perform best they present notably less overfitting than baselines.
    Negative Sampling in Variational Autoencoders. (arXiv:1910.02760v3 [cs.LG] UPDATED)
    Modern deep artificial neural networks have achieved great success in the domain of computer vision and beyond. However, their application to many real-world tasks is undermined by certain limitations, such as overconfident uncertainty estimates on out-of-distribution data or performance deterioration under data distribution shifts. Several types of deep learning models used for density estimation through probabilistic generative modeling have been shown to fail to detect out-of-distribution samples by assigning higher likelihoods to anomalous data. We investigate this failure mode in Variational Autoencoder models, which are also prone to this, and improve upon the out-of-distribution generalization performance of the model by employing an alternative training scheme utilizing negative samples. We present a fully unsupervised version: when the model is trained in an adversarial manner, the generator's own outputs can be used as negative samples. We demonstrate empirically the effectiveness of the approach in reducing the overconfident likelihood estimates of out-of-distribution inputs on image data.
    Policy Optimization Using Semi-parametric Models for Dynamic Pricing. (arXiv:2109.06368v2 [cs.LG] UPDATED)
    In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to Javanmard and Nazerzadeh [2019] except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision-making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. $F(\cdot)$ with $m$-th order derivative ($m\geq 2$), our policy achieves a regret upper bound of $\tilde{O}_{d}(T^{\frac{2m+1}{4m-1}})$, where $T$ is time horizon and $\tilde{O}_{d}$ is the order that hides logarithmic terms and the dimensionality of feature $d$. The upper bound is further reduced to $\tilde{O}_{d}(\sqrt{T})$ if $F$ is super smooth whose Fourier transform decays exponentially. In terms of dependence on the horizon $T$, these upper bounds are close to $\Omega(\sqrt{T})$, the lower bound where $F$ belongs to a parametric class. We further generalize these results to the case with dynamically dependent product features under the strong mixing condition.
    Recurrent Flow Networks: A Recurrent Latent Variable Model for Density Modelling of Urban Mobility. (arXiv:2006.05256v2 [stat.ML] UPDATED)
    Mobility-on-demand (MoD) systems represent a rapidly developing mode of transportation wherein travel requests are dynamically handled by a coordinated fleet of vehicles. Crucially, the efficiency of an MoD system highly depends on how well supply and demand distributions are aligned in spatio-temporal space (i.e., to satisfy user demand, cars have to be available in the correct place and at the desired time). To do so, we argue that predictive models should aim to explicitly disentangle between temporal} and spatial variability in the evolution of urban mobility demand. However, current approaches typically ignore this distinction by either treating both sources of variability jointly, or completely ignoring their presence in the first place. In this paper, we propose recurrent flow networks (RFN), where we explore the inclusion of (i) latent random variables in the hidden state of recurrent neural networks to model temporal variability, and (ii) normalizing flows to model the spatial distribution of mobility demand. We demonstrate how predictive models explicitly disentangling between spatial and temporal variability exhibit several desirable properties, and empirically show how this enables the generation of distributions matching potentially complex urban topologies.
    Minimax Estimation of Partially-Observed Vector AutoRegressions. (arXiv:2106.09327v2 [eess.SP] UPDATED)
    High-dimensional time series are a core ingredient of the statistical modeling toolkit, for which numerous estimation methods are known. But when observations are scarce or corrupted, the learning task becomes much harder. The question is: how much harder? In this paper, we study the properties of a partially-observed Vector AutoRegressive process, which is a state-space model endowed with a stochastic observation mechanism. Our goal is to estimate its sparse transition matrix, but we only have access to a small and noisy subsample of the state components. Interestingly, the sampling process itself is random and can exhibit temporal correlations, a feature shared by many realistic data acquisition scenarios. We start by describing an estimator based on the Yule-Walker equation and the Dantzig selector, and we give an upper bound on its non-asymptotic error. Then, we provide a matching minimax lower bound, thus proving near-optimality of our estimator. The convergence rate we obtain sheds light on the role of several key parameters such as the sampling ratio, the amount of noise and the number of non-zero coefficients in the transition matrix. These theoretical findings are commented and illustrated by numerical experiments on simulated data.
    Generalized Multi-Output Gaussian Process Censored Regression. (arXiv:2009.04822v2 [stat.ML] UPDATED)
    When modelling censored observations, a typical approach in current regression methods is to use a censored-Gaussian (i.e. Tobit) model to describe the conditional output distribution. In this paper, as in the case of missing data, we argue that exploiting correlations between multiple outputs can enable models to better address the bias introduced by censored data. To do so, we introduce a heteroscedastic multi-output Gaussian process model which combines the non-parametric flexibility of GPs with the ability to leverage information from correlated outputs under input-dependent noise conditions. To address the resulting inference intractability, we further devise a variational bound to the marginal log-likelihood suitable for stochastic optimization. We empirically evaluate our model against other generative models for censored data on both synthetic and real world tasks and further show how it can be generalized to deal with arbitrary likelihood functions. Results show how the added flexibility allows our model to better estimate the underlying non-censored (i.e. true) process under potentially complex censoring dynamics.
    Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting. (arXiv:2006.07507v3 [cs.LG] UPDATED)
    Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. The new update is derived through the solution of an Ordinary Differential Equation (ODE) and solved in a closed form. We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.
    Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification. (arXiv:2102.07711v2 [cs.LG] UPDATED)
    We study bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret $O(\log T)$ can be forced to suffer a regret $\Omega(T)$ with an expected amount of contamination $O(\log T)$. This amount of contamination is also necessary, as we prove that there exists an $O(\log T)$ regret bandit algorithm, specifically the classical UCB, that requires $\Omega(\log T)$ amount of contamination to suffer regret $\Omega(T)$. To combat such attacks, our second main contribution is to propose verification based mechanisms, which use limited verification to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with $O(\log T)$ expected number of verifications, a simple modified version of the ETC type bandit algorithm can restore the order optimal $O(\log T)$ regret irrespective of the amount of contamination used by the attacker. We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with $O(\log T)$ expected number of verifications. To derive a matching lower bound on the number of verifications, we prove that for any order-optimal bandit algorithm, this number of verifications $\Omega(\log T)$ is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget $B$, we propose a novel algorithm, Secure-BARBAR, which provably achieves $O(\min\{C,T/\sqrt{B} \})$ regret with high probability against weak attackers where $C$ is the total amount of contamination by the attacker, which breaks the known $\Omega(C)$ lower bound of the non-verified setting if $C$ is large.
    A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks. (arXiv:2205.02043v1 [stat.ML])
    Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $n$ and the structure of the manifold with intrinsic dimension $d$. When an atlas is given, we propose two-step test to identify the difference between general distributions, which achieves the type-II risk in the order of $n^{-1/\max\{d,2\}}$. When an atlas is not given, we propose H\"older IPM test that applies for data distributions with $(s,\beta)$-H\"older densities, which achieves the type-II risk in the order of $n^{-(s+\beta)/d}$. To mitigate the heavy computation burden of evaluating the H\"older IPM, we approximate the H\"older function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta)/d}$, which is in the same order of the type-II risk as the H\"older IPM test. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
    Estimation of Standard Auction Models. (arXiv:2205.02060v1 [cs.GT])
    We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the winner and the price they paid in a sequence of identical auctions, we provide algorithms for non-parametrically estimating the bid distribution of each bidder, as well as their value distributions under equilibrium assumptions. We provide finite-sample estimation bounds which are uniform in that their error rates do not depend on the bid/value distributions being estimated. Our estimation guarantees advance a body of work in Econometrics wherein only identification results have been obtained, unless the setting is symmetric, parametric, or all bids are observable. Our guarantees also provide computationally and statistically effective alternatives to classical techniques from reliability theory. Finally, our results are immediately applicable to Dutch and English auctions.
    The Grammar of Interactive Explanatory Model Analysis. (arXiv:2005.00497v4 [cs.LG] UPDATED)
    The growing need for in-depth analysis of predictive models leads to a series of new methods for explaining their local and global properties. Which of these methods is the best? It turns out that this is an ill-posed question. One cannot sufficiently explain a black-box machine learning model using a single method that gives only one perspective. Isolated explanations are prone to misunderstanding, leading to wrong or simplistic reasoning. This problem is known as the Rashomon effect and refers to diverse, even contradictory, interpretations of the same phenomenon. Surprisingly, most methods developed for explainable and responsible machine learning focus on a single-aspect of the model behavior. In contrast, we showcase the problem of explainability as an interactive and sequential analysis of a model. This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe potential human-model dialogues. It is implemented in a widely used human-centered open-source software framework that adopts interactivity, customizability and automation as its main traits. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model increases the performance and confidence of human decision making.
    Accelerating Inhibitor Discovery for Multiple SARS-CoV-2 Targets with a Single, Sequence-Guided Deep Generative Framework. (arXiv:2204.09042v2 [q-bio.QM] UPDATED)
    The COVID-19 pandemic has highlighted the urgency for developing more efficient molecular discovery pathways. As exhaustive exploration of the vast chemical space is infeasible, discovering novel inhibitor molecules for emerging drug-target proteins is challenging, particularly for targets with unknown structure or ligands. We demonstrate the broad utility of a single deep generative framework toward discovering novel drug-like inhibitor molecules against two distinct SARS-CoV-2 targets -- the main protease (Mpro) and the receptor binding domain (RBD) of the spike protein. To perform target-aware design, the framework employs a target sequence-conditioned sampling of novel molecules from a generative model. Micromolar-level in vitro inhibition was observed for two candidates (out of four synthesized) for each target. The most potent spike RBD inhibitor also emerged as a rare non-covalent antiviral with broad-spectrum activity against several SARS-CoV-2 variants in live virus neutralization assays. These results show that a broadly deployable machine intelligence framework can accelerate hit discovery across different emerging drug-targets.
    Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing. (arXiv:2205.01875v1 [stat.ML])
    We consider the problem of dynamic pricing of a product in the presence of feature-dependent price sensitivity. Based on the Poisson semi-parametric approach, we construct a flexible yet interpretable demand model where the price related part is parametric while the remaining (nuisance) part of the model is non-parametric and can be modeled via sophisticated ML techniques. The estimation of price-sensitivity parameters of this model via direct one-stage regression techniques may lead to biased estimates. We propose a two-stage estimation methodology which makes the estimation of the price-sensitivity parameters robust to biases in the nuisance parameters of the model. In the first-stage we construct the estimators of observed purchases and price given the feature vector using sophisticated ML estimators like deep neural networks. Utilizing the estimators from the first-stage, in the second-stage we leverage a Bayesian dynamic generalized linear model to estimate the price-sensitivity parameters. We test the performance of the proposed estimation schemes on simulated and real sales transaction data from Airline industry. Our numerical studies demonstrate that the two-stage approach provides more accurate estimates of price-sensitivity parameters as compared to direct one-stage approach.
    Local versions of sum-of-norms clustering. (arXiv:2109.09589v2 [cs.LG] UPDATED)
    Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.
    Nonstationary Bandit Learning via Predictive Sampling. (arXiv:2205.01970v1 [cs.LG])
    We propose predictive sampling as an approach to selecting actions that balance between exploration and exploitation in nonstationary bandit environments. When specialized to stationary environments, predictive sampling is equivalent to Thompson sampling. However, predictive sampling is effective across a range of nonstationary environments in which Thompson sampling suffers. We establish a general information-theoretic bound on the Bayesian regret of predictive sampling. We then specialize this bound to study a modulated Bernoulli bandit environment. Our analysis highlights a key advantage of predictive sampling over Thompson sampling: predictive sampling deprioritizes investments in exploration where acquired information will quickly become less relevant.
    Sparse Representations of Positive Functions via First and Second-Order Pseudo-Mirror Descent. (arXiv:2011.07142v4 [stat.ML] UPDATED)
    We consider expected risk minimization problems when the range of the estimator is required to be nonnegative, motivated by the settings of maximum likelihood estimation (MLE) and trajectory optimization. To facilitate nonlinear interpolation, we hypothesize that the search space is a Reproducing Kernel Hilbert Space (RKHS). We develop first and second-order variants of stochastic mirror descent employing (i) \emph{pseudo-gradients} and (ii) complexity-reducing projections. Compressive projection in the first-order scheme is executed via kernel orthogonal matching pursuit (KOMP), which overcomes the fact that the vanilla RKHS parameterization grows unbounded with the iteration index in the stochastic setting. Moreover, pseudo-gradients are needed when gradient estimates for cost are only computable up to some numerical error, which arise in, e.g., integral approximations. Under constant step-size and compression budget, we establish tradeoffs between the radius of convergence of the expected sub-optimality and the projection budget parameter, as well as non-asymptotic bounds on the model complexity. To refine the solution's precision, we develop a second-order extension which employs recursively averaged pseudo-gradient outer-products to approximate the Hessian inverse, whose convergence in mean is established under an additional eigenvalue decay condition on the Hessian of the optimal RKHS element, which is unique to this work. Experiments demonstrate favorable performance on inhomogeneous Poisson Process intensity estimation in practice.
    B\'ezier Curve Gaussian Processes. (arXiv:2205.01754v1 [stat.ML])
    Probabilistic models for sequential data are the basis for a variety of applications concerned with processing timely ordered information. The predominant approach in this domain is given by neural networks, which incorporate either stochastic units or components. This paper proposes a new probabilistic sequence model building on probabilistic B\'ezier curves. Using Gaussian distributed control points, these parametric curves pose a special case for Gaussian processes (GP). Combined with a Mixture Density network, Bayesian conditional inference can be performed without the need for mean field variational approximation or Monte Carlo simulation, which is a requirement of common approaches. For assessing this hybrid model's viability, it is applied to an exemplary sequence prediction task. In this case the model is used for pedestrian trajectory prediction, where a generated prediction also serves as a GP prior. Following this, the initial prediction can be refined using the GP framework by calculating different posterior distributions, in order to adapt more towards a given observed trajectory segment.  ( 2 min )
    Do More Negative Samples Necessarily Hurt in Contrastive Learning?. (arXiv:2205.01789v1 [cs.LG])
    Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more "negative samples" in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a "collision-coverage" trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.  ( 2 min )

  • Open

    [D] What do you think about PolyLoss?
    Paper - PolyLoss: A Polynomial Expansion Perspective Of Classification Loss Functions One reviewer believes that "this paper makes some interesting and thorough findings by approximating cross entropy loss using Taylor expansion." Another reviewer mentioned "In my comparisons it performed worse than LabelSmoothingCrossEntropy." Link: https://doublind.com/paper/2204.12511-PolyLoss:-A-Polynomial-Expansion-Perspective-of-Classification-Loss-Functions Have you read this paper? What do you think? ​ https://preview.redd.it/i43pynaxijx81.png?width=1559&format=png&auto=webp&s=640396a8fc4c4691730e52f8a1a92a9c1e57a3c5 submitted by /u/DouBlindDotCOM [link] [comments]  ( 1 min )
    [R] crop video when a character appears.
    I don't know if this has a name I would like to know if there is an ai to cut out a character that appears in a video at a time. that is, to have an input video where many characters appear and an output video with all the frames where the character that interests me appears. submitted by /u/macob12432 [link] [comments]  ( 1 min )
    [D] Proving Convergence of hybridlike training of DNNs
    Hi all! I am working on a submission for a specialized prediction task where the inputs go from space A->B->C. I have model F which predicts from A->B and model G which predicts from B->C. Now, I have trained these models in 3 different ways: Train F first and then G. The G model has a minima, the F model has a minima. Train F and G jointly. The combined model has a global minima Train F, freeze G for first half of epochs and then train the complete model for the rest of the epochs. For approach 3, I am struggling to identify what would be the minima of the model? In other words, is it possible to prove that approach 3 converges? PS: I have GT labels for B and C space. Loss is calculated as a*CE(B)+b*CE(C). submitted by /u/dwight_funke [link] [comments]  ( 1 min )
    [R] Question about multi-hop question answering using knowledge graphs
    Is anyone familiar with any papers / datasets that are doing more than 3 multi-hops? Seems like every paper / dataset I look at is either 1,2, or 3 hops at best. I'm particularly interested in 10+ hops submitted by /u/DaBobcat [link] [comments]
    Fast API vs Flask? "[Discussion]"
    Hey all I host a postcast and recently interviewed Sebastián Ramirez the creator of Fast API. Aside from the cool convo, I have been noticing lots of trends about Fast API potentially replacing flask. I also saw lots of Fast API love in this thread in the MLOps Community where I asked about which one people generally use these days. I'm interested in getting more data points and kicking off a discussion to hear how others look at this one? Is Flask still your go to? do you use both? which one are you opinionated about and why? submitted by /u/dpbrinkm [link] [comments]  ( 1 min )
    [D] : HELP Finding a Book - A book written for Google Engineers about foundational Math to support ML
    I am searching for a book that I was written to help engineers and new hire get up to speed with foundational mathematics in support of ML research at Google. It was not so much focused on ML, but more so a handbook on pure mathematics, such as proofs (at least thats what I got from the first chapter and what I remember from my waning memory). It was written in a very intuitive way and I can't for the life of myself find it. ​ Does this ring a bell for anyone? submitted by /u/nutin2chere [link] [comments]  ( 1 min )
    [N] New Google Deepmind Flamingo Tackles Multiple Tasks With Single Visual Language Model
    Deepmind recently introduced Flamingo, a single visual language model (VLM) that sets a new state of the art in few-shot learning on a wide range of open-ended multimodal tasks. This means Flamingo can tackle a number of difficult problems with just a handful of task-specific examples (in a “few shots”), without any additional training required. Flamingo’s simple interface makes this possible, taking as input a prompt consisting of interleaved images, videos, and text and then output associated language. Similar to the behaviour of large language models (LLMs), which can address a language task by processing examples of the task in their text prompt, Flamingo’s visual and text interface can steer the model towards solving a multimodal task. Given a few example pairs of visual inputs and expected text responses composed in Flamingo’s prompt, the model can be asked a question with a new image or video, and then generate an answer. Full website and whitepaper here Video here submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [P] Making Clubhouse's hallway more relevant with machine learning
    Hi there! Clubhouse shared a case-study on how it ranked recommendations for real-time, short-lived rooms in its Hallway with GBDTs and fast features. Thought you would find it interesting! Post here: https://blog.clubhouse.com/making-the-hallway-more-relevant-with-machine-learning/ Live Q&A: https://www.clubhouse.com/event/MOp41drX?utm_medium=ch_event&utm_campaign=sI95qy9i-EC5I3MvlueR7g-176282 submitted by /u/speedbreeze [link] [comments]  ( 1 min )
    [D] Accuracy Convergent Field Predictors
    Hello, Here is a paper on supervised machine learning topics: https://www.researchgate.net/publication/360264212_Accuracy_Convergent_Field_Predictors It discusses accuracy convergence for predictive algorithms. Techniques are presented for making algorithms achieve accuracy convergence. It also provides an analogy with the sampling of fields in physics. I am interested in your opinions. Thanks. submitted by /u/crispub [link] [comments]
    Why TensorFlow's API is more complete than PyTorch for everyday deep learning [D]
    What are some of the gaps between PyTorch (torch) in comparison to TensorFlow/ Keras (TF) usability? Sure, each one of these challenges is manually overcomeable for an advanced user, but together they add up to a more tedious experience. ​ Disclaimer; Please take this with a grain of salt and feel free to jump in with corrections. I am primarily a tf user that has only dabbled with torch. I welcome your help in drawing a more accurate comparison. I'm the creator of github.com/aiqc/aiqc, which abstracts/supports both tf and torch because they have the same components. --- 1. No NumPy input/ output for models. TF: allows you to feed your model an ndarray as inputs directly and return ndarray output. This is great because sklearn preprocessing and metrics use ndarrays. It makes for…  ( 6 min )
    [R] Using few-shot learning language models as weak supervision
    Large language models embed a lot of useful knowledge in their pre-trained weights, but they are typically insufficient solutions on their own, either due to knowledge gaps or inability to transfer what they know. But there’s another way. → https://snorkel.ai/few-shot-learning-large-language-models/ At Snorkel, we have seen practitioners get more utility from language models by applying these zero-shot or few-shot learners as labeling functions in a weak supervision framework. This tends to outperform both a language model and non-language-model LFs used in isolation. submitted by /u/robiriondo [link] [comments]  ( 1 min )
    [D] How to detect Covariate shift of NLP models?
    I have an NLP model, for example, Sentiment Analysis. This model serves in production. I want to detect Data Drift, and specifically Covariate Shift for this model. I saw that Cosine Similarity may solve this issue, but I'm concerned about: The ability to calculate it - Cosine similarity can be calculated for vectors that are in the same vector space. Are all of the embeddings that a model produces live in the same space? The time complexity - If I have 1M training data points and 1M prediction data points and I want to infer the average Cosine difference, I'll have to find the cosine difference of every prediction compared to every training data point. I can sample, but what is the sampling algorithm? I'd love to hear your opinion about: Different solutions to calculate Covariate shift for NLP models My concerns about the cosine similarity submitted by /u/igaloly [link] [comments]  ( 1 min )
    Architecture suggestion for multi-class classification task on images with captions [D]
    I am currently starting on a multi-class classification task on images with a small description of the images. We are given training data on images, their captions, and labels and we need to construct an architecture to classify labels of a new image with captions. I have done some research online and was suggested that the best architecture for multi-class classification task for images would be a combination of CNN and RNN but I couldn't find anywhere on how to utilize the caption with the images together for training. Any advice on where I should start? submitted by /u/anzhuoxianshen [link] [comments]  ( 1 min )
    [R][P][D] Available approaches to identify specific sentences in a text
    Hey guys, I am working on an NLP pipeline research project and want to reconsider one of its tasks. Currently, a heuristic is utilized to extract the target sentence that is most similar (centroid) to the overall text (the set of sentences which contain the target sentence). The target sentence would be the root of a graph I want to construct from the text. I have training data available with annotated sentences. Is there any recent approach I could implement to solve this problem? I thought about reformulating it as a question-answering approach using a Transformer, but I would like retrieve a more or less identical version of the target sentence. Looking forward to your answers! submitted by /u/wastingmytime69 [link] [comments]  ( 1 min )
    [P] Yaetos-A framework to simplify the creation of data pipelines
    Hi everyone, I would like to share a blog post explaining Yaetos, an open source data framework I created some time ago and used in previous companies : https://medium.com/@arthurprevot/yaetos-data-framework-description-ddc71caf6ce . It is meant for data scientists, engineers and analysts to create and schedule data pipelines and ML models in the AWS cloud. Any feedback on the tool or the article is welcome ! Thanks ! submitted by /u/arthurprevot [link] [comments]  ( 1 min )
    [P] Anomaly detection with similarity learning approach.
    Hi everyone! Anomaly detection is one of the exciting problems where metric learning can demonstrate an advantage over classical approaches. This case study illustrates how to do this with a practical example of quality control for coffee beans. How to train a detector of spoiled coffee beans with just a couple hundred labeled examples. https://qdrant.tech/articles/detecting-coffee-anomalies/ submitted by /u/devzaya [link] [comments]  ( 1 min )
    [N] D4 Data presents Podcast #15 "Federated Learning with Flower"
    Flower becomes international The traction of federated learning is increasing as well as for our open-source federated learning framework Flower (https://flower.dev/). In federated learning, we do not collect data to train AI models but we train AI models in data silos, only collect the AI models and aggregate them to create a global AI model. The global AI model has the knowledge of all data silos but has never seen their data. Therefore, federated learning connects data silos in a privacy-preserving manner. Many people understand already this functionality but some questions are still not answered such as: What is the difference between edge computing and federated learning? What are the use cases of federated learning? Can federated learning reduce the carbon footprint? If you want to know the answers then check out this podcast that was recorded by D4 Data Podcast. In addition, the history of federated learning and the differences between centralized learning and federated learning is presented so that also newbies to federated learning can easily understand the technology. https://www.youtube.com/watch?v=EFupbmLfkwQ submitted by /u/burnai [link] [comments]  ( 1 min )
    [D] [P] Trying to guess internal rules of an insurance company scoring mechanism
    Hi everyone! I'm relatively new to ML so maybe I'm inventing a bicycle here, but can you please hear me out and maybe give some advice. Let's say I'm an insurance broker providing data of my clients to insurance company to get a quote. Insurance company can either give me one or refuse to quote. The characteristics of the clients are encoded in a number of numerical values: age, car horse powers, coefficient depending on previous loss record, coefficient depending of territory and so on. So all numerical values. The decision of insurance company is based on some internal rules it has. For example: we don't insure drivers with loss record coefficient bigger than N or some other rules. Unfortunately company doesn't provide me with these rules. So I'd like to guess them to understand my target audience better and focus my marketing efforts only on those potential insureds that will for sure be provided with quote by insurance company. To achieve this I'm planning to do the following: build a model that will predict outcome of addressing the insurance company (1 - they agree to quote, 0 - they refuse) based on historic data of quotes and refusals I have on file. Then I will take an "average successful" quote and will start to change parameters of it one by one to see when my model will return 0 (insurance company refused to quote). By doing so I will try to guess boundaries of the coefficients in my data - meaning internal rules of insurance company. What do you think of this? How viable is this approach? submitted by /u/alex-and-r [link] [comments]  ( 2 min )
    [N] Jupyter Notebook Competition - Data science
    Hi everyone! I'd like to share a news about a competition you might be interested in. It's for those interested and passionate about #coding and #datascience / #bigdata. It's the Jupyter Notebook Competition that will run till 31 July 2022. Participants can choose between 4 tracks and develop/improve a notebook. You get to showcase your skills, uncover new insights on Copernicus data usage, and potentially win a cash prize from a total pool of €5,000! For more info: https://notebook.wekeo.eu/ The competition was funded by the #Copernicus programme and developed as a joint project with EUMETSAT, European Centre for Medium-Range Weather Forecasts, Mercator Ocean International and the European Environment Agency. submitted by /u/NbCompetition_WEkEO [link] [comments]  ( 1 min )
    [D]: What does Replika exactly consist of? (besides GPT-neo)
    I just wondered, besides the visual stuff, how does this chatbot work. Theres unmaintained libraries like cakechat on replika's github, but if someone could break down replika to its core components that would be fine. I imagine something like this: for loop to accept users input something to transform the input into something else better used for GPT-neo (what?) gpt-neo generator something to transform the output into something suitable for the user (what?) how do those 2 extra layers most likely work, any ideas to get deeper into it? submitted by /u/GerritTheBerrit [link] [comments]  ( 1 min )
    [P] Kaggle competition : Be honest, What do you think of this image? (Unsplash)
    Hello Reddit, I am writing this post to make some personal promotion of a Kaggle competition that I built to test the DVC framework. The goal of the competition is to predict the interest (combination of clicks,views and age) that an image can bring. Competition : https://www.kaggle.com/competitions/be-honest-what-do-you-think-of-this-image Have fun (Any feeback is welscome) submitted by /u/jeanmidev [link] [comments]  ( 1 min )
  • Open

    GraphWorld: Advances in Graph Benchmarking
    John Palowitch and Anton Tsitsulin, Research Scientists, Google Research, Graph Mining team Graphs are very common representations of natural systems that have connected relational components, such as social networks, traffic infrastructure, molecules, and the internet. Graph neural networks (GNNs) are powerful machine learning (ML) models for graphs that leverage their inherent connections to incorporate context into predictions about items within the graph or the graph as a whole. GNNs have been effectively used to discover new drugs, help mathematicians prove theorems, detect misinformation, and improve the accuracy of arrival time predictions in Google Maps. A surge of interest in GNNs during the last decade has produced thousands of GNN variants, with hundreds introduced each year. …  ( 8 min )
  • Open

    Wow ! So that's how a Data Science field is carried!!
    submitted by /u/networkninja10 [link] [comments]
    Another Firing Among Google’s A.I. Brain Trust, and More Discord
    submitted by /u/BB4evaTB12 [link] [comments]
    AI Dream 42 - Dramatic End of the World ByeBye City
    submitted by /u/LordPewPew777 [link] [comments]
    AI News | Breakthrough AI Robot Arm Pick And Place System | New Google Deepmind Flamingo Visual Language Model
    submitted by /u/getrich_or_diemining [link] [comments]
    Ripples in Time (made with starryai)
    submitted by /u/Losthel [link] [comments]
    DALL-E 2 is amazing, but what's even cooler is how it actually *understands* text and produces images. (Article version linked in description)
    submitted by /u/OnlyProggingForFun [link] [comments]
    Generate images from text with Latent Diffusion LAION-400M
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 3 min )
    Deepmind Introduces Flamingo: An Open-Ended Single Visual Language Model (VLM) For Multimodal Machine Learning Research
    Intelligence measures how quickly a person can adjust to a new situation using only a few simple instructions. Despite the contrasts between the two, children may recognize real animals in the zoo after seeing a few photographs of the animals in a book. On the other hand, Typical visual models do not yet reflect this level of human intellect. They need to be trained on tens of thousands of examples that have been explicitly annotated for that task. If the goal is to count and identify animals in an image, such as “three zebras,” thousands of photographs must be collected and each image annotated with their numbers and species. The requirement to train a new model each time it is confronted with a new job is the most predominant drawback, making the process inefficient, costly, and resource…  ( 1 min )
    Better text search
    Can somebody recommend a better way for me to search through a document of 50,000 document descriptions for things that may not exist in the description but contextually is. Example: Searching for "fintech" returns a company description that deals with finances but doesn't necessarily have the word "fintech" in it submitted by /u/CanadianBacon2021 [link] [comments]  ( 1 min )
    Has anyone heard of or participated in the Persolv AI boot camp?
    It offers some really cool opportunities but it’s extremely expensive. Was wondering if anyone has had any experience with it that they would be willing to share. Seems really cool, but I don’t want to get scammed out of thousands. submitted by /u/eatingbaboons [link] [comments]  ( 1 min )
  • Open

    Deploy and manage machine learning pipelines with Terraform using Amazon SageMaker
    AWS customers are relying on Infrastructure as Code (IaC) to design, develop, and manage their cloud infrastructure. IaC ensures that customer infrastructure and services are consistent, scalable, and reproducible, while being able to follow best practices in the area of development operations (DevOps). One possible approach to manage AWS infrastructure and services with IaC is […]  ( 6 min )
  • Open

    Computer Vision development services: a brief introduction
    Talking about the main Computer Vision development services and identifying key activities, goals, and outcomes to simplify…  ( 5 min )
  • Open

    Setting AIs on SIGGRAPH: Top Academic Researchers Collaborate With NVIDIA to Tackle Graphics’ Greatest Challenges
    NVIDIA’s latest academic collaborations in graphics research have produced a reinforcement learning model that smoothly simulates athletic moves, ultra-thin holographic glasses for virtual reality, and a real-time rendering technique for objects illuminated by hidden light sources. These projects — and over a dozen more — will be on display at SIGGRAPH 2022, taking place Aug. Read article > The post Setting AIs on SIGGRAPH: Top Academic Researchers Collaborate With NVIDIA to Tackle Graphics’ Greatest Challenges appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    `ReinforcementLearning.jl` presentation in the Julia User Group Munich
    https://youtu.be/ckIIxKM14Ow submitted by /u/kir0ul [link] [comments]
    Train your first Deep Reinforcement Learning agent to land correctly on the moon 🌕 (Deep Reinforcement Learning Free Class by Hugging Face 🤗)
    submitted by /u/chinacat2002 [link] [comments]  ( 1 min )
    Train your first Deep Reinforcement Learning agent to land correctly on the moon 🌕 (Deep Reinforcement Learning Free Class by Hugging Face 🤗)
    Hey there! We're happy to announce that we just published the first Unit of Deep Reinforcement Learning Class) 🥳 In this Unit,you'll learn the foundations of Deep RL. And you’ll train your first lander agent🚀 to land correctly on the moon 🌕 using Stable-Baselines3 and share it with the community. You’ll be able to compare the results of your LunarLander-v2 with your classmates using the leaderboard 🏆 👉 https://huggingface.co/spaces/ThomasSimonini/Lunar-Lander-Leaderboard 1️⃣ The introduction to deep learning article 👉 https://huggingface.co/blog/deep-rl-intro 2️⃣ The hands-on 👉 https://github.com/huggingface/deep-rl-class/blob/main/unit1/unit1.ipynb 3️⃣ The leaderboard 👉 https://huggingface.co/spaces/ThomasSimonini/Lunar-Lander-Leaderboard If you have questions and feedback I would love to answer, https://preview.redd.it/3s48ncsf2hx81.png?width=1920&format=png&auto=webp&s=0a08759ee9110be333208f27efd369818488d0c7 submitted by /u/cranthir_ [link] [comments]  ( 1 min )
    How to handle problems where the size of the state is dependent on actions and available actions depends on the state?
    Hello guys, quite new to this group and RL. The problem I want to deal with is a more complicated version of allocation/assignment problem. Here the state space is not only large but the size of the state also changes for different situations. And the action space is same as the state space. Is there any examples/algorithms I should look into about this? submitted by /u/Blue_Dude3 [link] [comments]  ( 2 min )
    Performance of policy (reward) massively deteriorates after a certain amount of iterations
    Hi all, as you can see below in the plot "rewards", the rewards seem to be really good at a few iterations, but deteriorates again and then destroyed from 50k iterations. Will there be any method to prevent the reward from swinging so much and make it somehow constantly increase? (Decreasing the learning rate didn't help...) What does the low reward from 50k iterations imply? https://preview.redd.it/gz9tzfbuxex81.png?width=1568&format=png&auto=webp&s=ef5cf4356ba67ed27c61b0648d9d8519773a66c1 submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
  • Open

    Improve Performance and Data Availability with Elastic Block Store (EBS)
    Author: Bokang Zhang, Chenhao Huang  Post Summary:   Nowadays, many Database-as-a-Service (DBaaS) solutions separate the computation layer and the storage layer. These include, for example, Amazon Aurora and Google BigQuery. This solution is attractive, as the data storage and data replication can be handled by existing services. DBaaS takes off the need to worry about this… Read More »Improve Performance and Data Availability with Elastic Block Store (EBS) The post Improve Performance and Data Availability with Elastic Block Store (EBS) appeared first on Data Science Central.  ( 6 min )
    Seven data-centric architecture innovations that businesses can’t afford to overlook
    My mentors and compadres at Semantic Arts hold a Data-Centric Architecture Forum (DCAF) every year in Fort Collins, Colorado. It’s a chance for technically inclined data and data modeling enthusiasts to brainstorm on the gaps in the architecture and how to fill those gaps. This year’s DCAF takes place on June 6th – 8th, 2022.… Read More »Seven data-centric architecture innovations that businesses can’t afford to overlook The post Seven data-centric architecture innovations that businesses can’t afford to overlook appeared first on Data Science Central.  ( 4 min )
    Today’s Data Purgatory and a History Lesson from the Energy Industry
    Now I shall sing the second kingdom there where the soul of man is cleansed, made worthy to ascend to Heaven. Dante Alighieri, The Divine Comedy, Purg. I.4–9 In its most recent (April 2022) Data and Analytics Trends report, Gartner made a number of good points in and around the subject of AI and how… Read More »Today’s Data Purgatory and a History Lesson from the Energy Industry The post Today’s Data Purgatory and a History Lesson from the Energy Industry appeared first on Data Science Central.  ( 4 min )
    Handling SQL Server Database Corruption When Your File System Fails
    We have designed a lab environment to do some tests on some devices. One of the devices that we had tested was a WD My Book World Edition 1TB (White Light). When we plugged in the power cord, it sounded like it started up fine however, the web interface was not accessible and there was… Read More »Handling SQL Server Database Corruption When Your File System Fails The post Handling SQL Server Database Corruption When Your File System Fails appeared first on Data Science Central.  ( 4 min )
    The Advent of Killer Robots
    Next-generation warfare involves human-machine teaming (HMT). Human-AI interfaces are more error prone than traditional combat. Autonomous robots as a solution poses ethical challenges. At the core of the military’s use of human-machine teaming (HMT) is artificial intelligence used to program flights and target drone strikes. But while drones remove combat pilots from dangerous situations, the… Read More »The Advent of Killer Robots The post The Advent of Killer Robots appeared first on Data Science Central.  ( 3 min )
    The executable digital twin
    Recently I read that Siemens has defined a new term called ‘executable digital twin’ If you have worked in the space of digital twins before you know that there is no shortage of definitions for digital twins! At first, I thought, why muddy the waters with yet one more definition But I think there is… Read More »The executable digital twin The post The executable digital twin appeared first on Data Science Central.  ( 3 min )
    Some Helpful Tips to Choose the Best Domain Registrar
    When you’re planning to purchase the domain, you need to go through a domain registrar. Apart from the overwhelming process of choosing a domain name, you also need to be careful while choosing the domain registrar.  Domain registrars are reputed and professional companies that sell different types of domain names for the websites and manage… Read More »Some Helpful Tips to Choose the Best Domain Registrar The post Some Helpful Tips to Choose the Best Domain Registrar appeared first on Data Science Central.  ( 3 min )
    Green Banking for Sustainability and Responsible Environmental Protection
    Green banking's a way of banking that aims for sustainability and responsibility in order to protect the environment. The post Green Banking for Sustainability and Responsible Environmental Protection appeared first on Data Science Central.  ( 4 min )
    Benefits of Data Governance
    Data governance is the process of managing the data’s usability, security, availability, and quality within an organization using internally set and enforced rules and policies. Data governance is a must for any organization that seeks to use its data for analysis. It creates an environment where data can thrive as a source of useful insight… Read More »Benefits of Data Governance The post Benefits of Data Governance appeared first on Data Science Central.  ( 3 min )
    Smart Cities of the Future- Powered by IoT
    A smart city is an urban zone which uses different types of electronic systems and sensors to collect data. The smart city concept combines information and communication technology and several physical devices linked to Internet of Things networks to enhance adeptness of the city operations and services. The smart city concept is gaining important market… Read More »Smart Cities of the Future- Powered by IoT The post Smart Cities of the Future- Powered by IoT appeared first on Data Science Central.  ( 2 min )
    Is Detailed Design Anti-Agile?
    In the Beginning . . . In days of yore, systems development projects were front-ended with laborious requirements engineering and design tasks.  This made sense then because development was labor-intensive, time-consuming, and expensive.  Changes to the scope or design of a solution mid-development increased the likelihood of errors and incremental time and expense.  In recognition… Read More »Is Detailed Design Anti-Agile? The post Is Detailed Design Anti-Agile? appeared first on Data Science Central.  ( 9 min )
    5 data trends business leaders should anticipate in 2022
    Not surprisingly, digital transformation is a prerequisite for forward-thinking businesses. The catastrophic disruption of the global pandemic did not slow down the need for systems, processes, and people who will help modern organizations move faster. Data, as always, is top of mind.  With so many trends and tools available, it can be hard to see… Read More »5 data trends business leaders should anticipate in 2022 The post 5 data trends business leaders should anticipate in 2022 appeared first on Data Science Central.  ( 6 min )
  • Open

    Does anyone have a good code example of a MLP Neural Network with Momentum implemented?
    I have been asked to make a MLP Neural Network code from scratch with learning rate and momentum implemented. submitted by /u/TobiasFred [link] [comments]  ( 1 min )
  • Open

    Artificial intelligence system learns concepts shared across video, audio, and text
    A machine-learning model can identify the action in a video clip and label it, without the help of humans.  ( 6 min )

  • Open

    [D] Any tips for handling sparse, discrete, temporal data?
    Hi, I'm wondering if anyone could help familiarize me with modern feature engineering methods for handling this type of data in the context of machine learning. Background The data I'm talking about is hospital/facility utilization data for patients. Below is a table that is similar to the data I have for each patient. In this example we can see that this single patient went to the ED on the 5th day, was admitted into the hospital on the 5th day, and was admitted to a SNF (Skilled Nursing Facility -basically a nursing home) on the 8th day: ED Hospital SNF Day 1 0 0 0 Day 2 0 0 0 Day 3 0 0 0 Day 4 0 0 0 Day 5 1 1 0 Day 6 0 0 0 Day 7 0 0 0 Day 8 0 0 1 Day 9 0 0 0 In reality, we have more types of facilities as columns and years worth of days. Problem…  ( 2 min )
    [P] Deploying my MIDI generator
    Not sure if this is the best place for this, as I am not a Machine Learning engineer. I have developed a web application that lets you generate and remix MIDI with NLP Transformers. Right now it runs on Flask. The model is fairly small as far as transformers go (500Mb) and doesn't use very much memory when running. It is only doing inference and so it runs for about 5-15s while processing a job, containerized it uses less than 4Gb memory at all times. I am trying to figure out how to deploy my Flask server to EC2 without incurring massive compute costs. It seems like all of the instances with GPU or Spot GPU support are prohibitively expensive and provide way more memory and local storage than I would need. Have any of you encountered a similar problem? Are there any instances that you suggest? Project here: https://github.com/pickles976/chiptune-ai submitted by /u/EuphoricFedoria [link] [comments]  ( 1 min )
    [D] Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    Diffusion models (DMs) have a more stable training phase than GANs and less parameters than autoregressive models, yet they are just really resource intensive. The most powerful DMs require up to a 1000 V100 days to train (that’s a lot of $$$ for compute) and about a day per 1000 inference samples. The authors of Latent Diffusion Models (LDMs) pinpoint this problem to the high dimensionality of the pixel space, in which the diffusion process occurs and propose to perform it in a more compact latent space instead. In short, they achieve this feat by pertaining an autoencoder model that learns an efficient compact latent space that is perceptually equivalent to the pixel space. A DM sandwiched between the convolutional encoder-decoder is then trained inside the latent space in a more computationally-efficient way. In other words, this is a VQGAN with a DM instead of a transformer (and without a discriminator). As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/293 Blog post: https://www.casualganpapers.com/high-res-faster-diffusion-democratizing-diffusion/Latent-Disffusion-Models-explained.html Latent Diffusion Models arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
    [D] Training set size of SOTA models?
    Hi, I'm trying to get an idea of just how large datasets for big ML models, such Dall-e are. Does anyone have some references? GPT-3 is apparently trained on 45TB of data: [https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai](https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai) ​ How large are other big datasets? submitted by /u/rk3000 [link] [comments]
    [D] How to train large CLIP-style models?
    I'd like to train a medium-large contrastive model similar to CLIP. Something that will fit onto a single GPU but for which I can only compute a few batch elements at a time. The issue is that these types of models rely on other batch elements for the "contrastive" part of the equation. Bigger batch sizes lead to faster (and better) convergence. Is anyone aware of any papers (or have any hands-on tips) that address this issue? submitted by /u/neonbjb [link] [comments]  ( 1 min )
    [D] Focus on the Process: Formulating AI Ethics Principles More Responsibly
    Hi there, there is a new Gradient article some of you may find interesting: Focus on the Process: Formulating AI Ethics Principles More Responsibly Here's a preview of what it's about: It is tempting to respond to the present state in AI ethics by abandoning searches for principles. Given that there are so many principles out there already and so few tools to operationalize them, organizations might be inclined to simply use some of the existing principles and focus their attention on operationalization. However, a difficult question to answer is which principles to use. How do we know that organizations will choose well? What is to prevent them from cherry picking, for example? One way to go is to sift through the existing literature, looking for universal AI ethics principles. The hope might be that if we find universal principles, they could guide the development and evaluation of AI systems everywhere. Organizations that develop AI systems could focus on operationalizing them. Those who evaluate AI systems, such as investors, regulators, auditors, and consumers, could examine AI systems based on these principles. I advise against this approach. In this article, I explain why it is unlikely that universal AI ethics principles will be found and I discuss reasons to avoid using dominant trends as default. Instead, I suggest that each organization should articulate its own AI ethics principles, and I sketch ways to do so responsibly. Enjoy! submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [D] Are multilayer perceptrons reversible?
    I have a multilayer perceptron that maps input X to output Y. Three hidden layers, with "tanh" activations in each, except the layer with linear (identity) activation leading the output Y. Can I reverse the network such that it would map Y to X? submitted by /u/boleslev [link] [comments]  ( 1 min )
    [D] Anyone working on Explanable AI?
    I am currently working at ML@Uber Is anyone here dealing with the problem of explanable AI? i.e. how are you able to understand and interpret predictions made by your machine learning models. Anyone here facing this problem or already have solved this problem? submitted by /u/gauravapiscean [link] [comments]  ( 1 min )
    [D] GCP compute enging pricing question
    I'm currently doing a little personal project which involves training a VQVAE (basically a conv NN and a variational autoencoder). I've got it all up and running after a few weeks of figuring things out and I'm finding that I'm being rinsed on costs. My basic setup is a c2-standard-4 VM (4 v-cpus and 16GB ram) and a 100GB disk mounted, I'm also spinning down the VM when not in use so I don't get a discount for continual usage. I started with an n1-standard-1 but found myself quickly running out of RAM, which still happens if my batch size is too high but less frequently. The disk is necessary as I'm dealing with large datasets but doesn't seem to be costing me much. As of right now my costs are about $4.5 per day, which obviously isn't that high, but for a personal project the costs soon rack up if I'm running this for days at a time every week. On my current setup I can run a batch size of 8, any higher than this and I run out of memory, and an epoch takes about 5 hours. The point I'm struggling with at the moment is can I be better optimising CPU vs runtime, i.e. is it worth swapping to slower/fewer but cheaper CPUs, or will it cost me just as much because they run longer? The other thing I'm not sure about is if I can gain much by using other platforms, my current setup isn't really that tied into the GCP ecosystem so I can switch fairly easily. I realise my questions are fairly general and everyone's setups are different, so I'm just looking for a point in the right direction if nothing else. Thanks! Edit: also if there's anywhere that can give me a decent amount of free hours I'd also be interested in that too! submitted by /u/Batteredcode [link] [comments]  ( 3 min )
    [D] Client-Side Caching Improves Feature Store Performance by 70% at DoorDash
    To enable our platform to support hundreds of data driven models and produce billions of model predictions we build a robust ML platform, feature store and prediction engine. This was only the beginning as the feature store at the heart of the platform utilized multiple TB's of memory in large Redis clusters, which needed to be optimized for cost and fast loading times for the optimal customer experience. To improve the feature store performance we implemented a caching layer but still needed to choose the best caching library, implement this solution and analyze the platform to set up experiments that would validate the new approach. I wanted to share this journey with the developer community so they can learn from my experience and how I was able to improve feature store performance by 70% at DoorDash. Please check out the article and let me know your thoughts on my approach: https://doordash.engineering/2022/05/03/how-we-applied-client-side-caching/ submitted by /u/csko7 [link] [comments]  ( 1 min )
    Twitter algorithm already open source?[D]
    I've been seeing some people talking about how elon's going to make twitter's algorithm open source but isn't it already publicly available? At least for Who To Follow WTF so I don't get why people are losing their minds over this. submitted by /u/lehmanmafia [link] [comments]  ( 2 min )
    [R] [D] SERF activation function - improving Swish
    https://arxiv.org/abs/2108.09598 Paper looks very promising, what do you think? Anyone tried SERF yet? submitted by /u/Shronnin [link] [comments]  ( 1 min )
    Sound clip Recommendation Model [D]
    Hello everyone! So I am starting a project where I need to build a model that can identify segments of audio that detect “problems with call center agents”. I have around 65,000 pieces of audio all that have had segments that have been labeled. I also have the transcribed text and the exact times the where the audio was segmented. I am thinking of starting with Data2Vec. I would like to hear your opinions on how you would approach this. Thanks in advance! submitted by /u/JS-AI [link] [comments]  ( 1 min )
    [D] Why are graph neural networks applied to non-graph structured data?
    Recently I have come across several papers that use graph neural networks for processing images, audio and video data. I can understand why GNNs are needed for data that have a graph structure (like molecules or traffic network etc). But for images and audio, what is the difference between using GNNs versus normal DNN? submitted by /u/Far_Conversation_445 [link] [comments]  ( 2 min )
    [D] Would an ML Ops platform be useful?
    Hey folk, I was wondering, would a ML Ops platform be something the community would embrace? A tool where from a CLI or GUI you can easily deploy models in a productive environment, or expose them as microservices? Something that solves the infrastructure bits and configs is what I wonder.. Or do people prefer to simply handle their own infra or manage on premises machines? submitted by /u/zedrakk [link] [comments]  ( 1 min )
    [D] Handling Missing Data or Encoding/Scaling. What comes first?
    Just out of curiosity, I just want to know whether you handle missing data before scaling/encoding of the dataset or after it. Thank you, submitted by /u/trj_flash75 [link] [comments]  ( 1 min )
    [R] Meta is releasing a 175B parameter language model
    submitted by /u/StellaAthena [link] [comments]  ( 2 min )
  • Open

    Is it correct to say that a nn with n layers is an nth order approximation?
    submitted by /u/HasFiveVowels [link] [comments]
    Stanford has trained AI to classify proteins
    submitted by /u/aidev2040 [link] [comments]
    Hi, im looking to build a neural network from scratch for a recommender system, that attempts to recommend anime programmes, could someone point me to relevant papers/books/resources?
    Hi, So for my final year project, I decided that im going to code a neural network from scratch for a recommender system, however I dont know the first thing about this topic so im going to be starting from the ground up. Could anyone point me in the right direction and recommend some resources etc? thanks! submitted by /u/tvvvs [link] [comments]  ( 1 min )
    CNN ERROR Inconsistent number of samples
    I am building a CNN network to detect DGA,DNS but i have encountered this problem and just don't know how to fix it. https://preview.redd.it/mnkfhb2cn8x81.png?width=719&format=png&auto=webp&s=ac1ec84b0721f43f36d0548173d12642b7ba0f26 submitted by /u/Dangerous_Intern_510 [link] [comments]
    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated)
    submitted by /u/maneesh123456 [link] [comments]
  • Open

    Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    Diffusion models (DMs) have a more stable training phase than GANs and less parameters than autoregressive models, yet they are just really resource intensive. The most powerful DMs require up to a 1000 V100 days to train (that’s a lot of $$$ for compute) and about a day per 1000 inference samples. The authors of Latent Diffusion Models (LDMs) pinpoint this problem to the high dimensionality of the pixel space, in which the diffusion process occurs and propose to perform it in a more compact latent space instead. In short, they achieve this feat by pertaining an autoencoder model that learns an efficient compact latent space that is perceptually equivalent to the pixel space. A DM sandwiched between the convolutional encoder-decoder is then trained inside the latent space in a more computationally-efficient way. In other words, this is a VQGAN with a DM instead of a transformer (and without a discriminator). As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/293 Blog post: https://www.casualganpapers.com/high-res-faster-diffusion-democratizing-diffusion/Latent-Disffusion-Models-explained.html Latent Diffusion Models arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
    Strange Galaxies (A.I animation + sound design)
    submitted by /u/nenomancer [link] [comments]
    Focus on the Process: Formulating AI Ethics Principles More Responsibly
    submitted by /u/regalalgorithm [link] [comments]
    1914 - People engaged in winter sports activities near Wetzlar, Hesse, Germany [Colored using AI]
    submitted by /u/pheonix_bird [link] [comments]
    Stanford has trained AI to classify proteins
    submitted by /u/aidev2040 [link] [comments]
    Common Voice Has A New Dataset
    submitted by /u/limapedro [link] [comments]
    7 Best Natural Language Processing Courses (2022) | Best NLP Courses
    submitted by /u/maneesh123456 [link] [comments]
    Robots still not around us?… — mathematicians to blame
    submitted by /u/marvelmind_robotics [link] [comments]  ( 1 min )
    A Look at Machine Learning
    submitted by /u/lutipri [link] [comments]
    Generating alternate Spongebob theme songs with OpenAI Jukebox
    submitted by /u/PF-Wang [link] [comments]
    AI fitness app
    Hi guys, recently came across a new fitness app with artificial intelligence, looks interesting and promising, and do you think AI can replace real trainers? Here's the link QR Kinestex (google.com) submitted by /u/vovayoung [link] [comments]
    Zama Open-Sources Concrete ML v0.2 To Support Data Scientists Without Any Prior Cryptography Knowledge To Automatically Turn Classical Machine Learning (ML) Models Into Their FHE Equivalent
    Zama is a Paris-based startup that aims to bring end-to-end encryption to AI by enabling developers to use Python to create models that run on encrypted data. In late April, the Zama Team released the public alpha release of Concrete ML, a package developed on top of Concrete Numpy. This release provides data scientists with no prior knowledge of cryptography with simple APIs for automatically converting traditional machine learning (ML) models into their FHE equivalents. One of the main goals of this version is to make using Concrete ML as convenient as possible for users of popular machine learning frameworks. Model training for linear models and trees is not reimplemented in Concrete ML, allowing researchers to utilize several variations and features of these models that the scikit-learn package supports. The team conducted a series of experiments to compare the models between scikit-learn and Concrete ML. When tested on simple 2D linear models, FHE performance was comparable to that of its unencrypted scikit-learn equivalents. However, as the number of dimensions increases in the current release, the performance of strongly quantized classifiers rapidly falls, which will be refined in future releases. On encrypted data, tree-based classifiers utilizing Concrete ML show outstanding accuracy. Running tree models that demand heavy comparisons can be easily enabled thanks to Zama’s unique approach to FHE that provides Programmable Bootstrapping. As a result, tree-based models perform as well as their scikit-learn/xgboost counterparts in FHE. This remains true even for datasets with many dimensions, and tree-based models are typically the most performant when dealing with tabular data. Data scientists can implement Decision Trees, Random Forests, and Gradient Boosted Trees after the team publishes specific deployment APIs. Continue Reading Github: https://github.com/zama-ai/concrete-ml submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    AI painting Marvel superheroes
    submitted by /u/notrealAI [link] [comments]  ( 1 min )
  • Open

    Alpa: Automated Model-Parallel Deep Learning
    Posted by Zhuohan Li, Student Researcher, Google Research, and Yu Emma Wang, Senior Software Engineer, Google Core Over the last several years, the rapidly growing size of deep learning models has quickly exceeded the memory capacity of single accelerators. Earlier models like BERT (with a parameter size of < 1GB) can efficiently scale across accelerators by leveraging data parallelism in which model weights are duplicated across accelerators while only partitioning and distributing the training data. However, recent large models like GPT-3 (with a parameter size of 175GB) can only scale using model parallel training, where a single model is partitioned across different devices. While model parallelism strategies make it possible to train large models, they are more complex in that th…  ( 8 min )
  • Open

    The Scientist of the Scientist
    Science has been the most important tool of humanity even before the dawn of history. Humans have used science, without even knowing that…  ( 5 min )
  • Open

    Quartal melody: Star Trek fanfare
    Intervals of a fourth, such as the interval from C to F, are common in western music, but consecutive intervals of this size are not. Quartal harmony is based on intervals of fourths, and quartal melodies use a lot of fourths, particularly consecutive fourths. Maybe the most famous quartal melody is the opening fanfare to […] Quartal melody: Star Trek fanfare first appeared on John D. Cook.  ( 2 min )
  • Open

    What are some top venues for the submission of a Reinforcement learning related paper?
    submitted by /u/blitzkreig3 [link] [comments]  ( 1 min )
    Explanation of Optimal Policies and Value Functions | Dynamic Programming
    submitted by /u/shani_786 [link] [comments]
  • Open

    Rethinking Human-in-the-Loop for Artificial Augmented Intelligence
    How do we build and evaluate an AI system for real-world applications? In most AI research, the evaluation of AI methods involves a training-validation-testing process. The experiments usually stop when the models have good testing performance on the reported datasets because real-world data distribution is assumed to be modeled by the validation and testing data. However, real-world applications are usually more complicated than a single training-validation-testing process. The biggest difference is the ever-changing data. For example, wildlife datasets change in class composition all the time because of animal invasion, re-introduction, re-colonization, and seasonal animal movements. A model trained, validated, and tested on existing datasets can easily be broken when newly collected dat…  ( 5 min )
  • Open

    ‘In the NVIDIA Studio’ Welcomes Concept Designer Yangtian Li
    This week In the NVIDIA Studio, we welcome Yangtian Li, a senior concept artist at Singularity6. Li is a concept designer and illustrator who has worked on some of the biggest video game franchises, including Call of Duty, Magic: the Gathering and Vainglory. Her artwork also appears in book illustrations and magazines. The post ‘In the NVIDIA Studio’ Welcomes Concept Designer Yangtian Li appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    [D] How to do unsupervised anomaly detection in a principled way?
    I have terabytes of tabular and image data. I want to determine what data points are novel/anomalous. I have a couple anchor data points for entities that are known and features about said entities. Effectively I want to find things that are novel relative to my anchors without any labels for if something is actually interesting and not noise to train on. Here's an analogy: I have driving footage of a reference driver who is "perfectly" driving. I also have all of the sensor data: accelerometers, where the human is looking, etc. I simulate 1000s of self-driving cars that are running their own agent models. I get data on all of these as well. I deploy an agent to the real world that is expected to do better/comparable to the reference driver while having different model parameters. I don't know how this driving agent will do in the real world. I want to choose agent instantiations that are different from the rest in some regard (anomaly) I'm constrained by how many deployments I can make. It's expensive and dangerous. ​ An issue I face with this is that there are dozens of models that will give me anomalous points, but sometimes they don't overlap at all. There's no consistency. Ideally I want to find data points that are novel in feature space (far away?), but expected to be functional... on non-labeled data. So yea... after writing this it kind of sounds like just a screwed situation, but maybe someone here has an idea/experience of how to build a system that can enable this kind of parameter selection engine (suggesting novel parameters that are functional without any labeled data). submitted by /u/memproc [link] [comments]  ( 1 min )
    [P] Transfer Learning with BERT, number of examples.
    I'm looking to make a review classifier (3 classification). It sounds like transfer learning is standard these days. But how many examples would generally be needed to make an improvement on the out of the box result? Would 1,000 be sufficient or realistically would it make to be in the 10's of thousands. Or is it a "how long is a piece of string question" edit: sorry wrong tag submitted by /u/mldude8 [link] [comments]  ( 1 min )
    [D] Present ML classification model results for non technical
    I wanna include Machine Learning classification (str target) results into a web platform, what informations that I can show and will be meaningful , I can already think of (Predicted value, Accuracy, Probability) what else I can include, graphics or anything useful to present my results for non technical people (clients)? submitted by /u/According-Promise-23 [link] [comments]  ( 1 min )
    [D] How do some people publish so much in this field?
    I'm not talking about PIs necessarily, but sometimes I search up other grad students in my department or people who email me and some of them have 10+ first author papers a year? I don't really understand how this is possible; I don't think I'm the best implementer out there but it would take me at least 2-3 months to read the literature, come up with a new research question, run experiments, iterate and write up. I mostly work with just me and my advisor though, maybe that's the difference? If you're one of these people, are you just hyper-productive? Already supervising students? Or collaborating with a ton of people and not doing a high proportion of work on any project? submitted by /u/IPvIV [link] [comments]  ( 5 min )
    [P] Pretraining dense retrievers with masked language model objective(REALM)
    Hi, I made a video explaining REALM. It is a pretraining method for dense retrievers. It uses a language model along with a retriever for pretraining. Given a random masked sentence like "Each angle in an equilateral triangle is [MASK]", the retriever gets top passages that might contain information about equilateral triangles. The passages are then passed to a language model to predict the value for each "[MASK]" token. Using this MLM objective, as model performance improves so does the quality of retrieval. A simple and effective idea for pretraining. This is the final video of our series on Open-domain question answering using dense retrievers. I will appreciate any feedback. Thanks for the support till now. https://www.youtube.com/watch?v=aQcoI1t6HOs submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] Where did MT-NLG go wrong with their scaling experiments, comparing its capabilities to PaLM?
    The MT-NLG model was 530B parameters compared to PaLM's 540B. They seem to have done things correctly from what I skimmed, However their model is neither that impressive on benchmarks, nor does it demonstrate any special capabilities. So what was the reason MT-NLG didn't work as well as expected? Is it possible it has abilities to explain jokes (on par PaLM) but they were undiscovered by the authors? Or are there any gaping flaws in how they scale the different hyperameters (heads, layers, dims etc.)? Perhaps such an analysis has already been done, but I would love to see what you guys think about why it underperformed... In such an unknown area as this, it seems that unless one scales models with multiple attempts it's hard to accurately judge when we would have reached the point where scaling laws fall off. submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 1 min )
    [R] How to access CVPR 21 workshop extended abstracts?
    Hi everyone, I intend to submit a short paper to a 2022 CVPR workshop. I wanted to get inspiration from other CVPR short abstracts to see how they are written, their structure and so on. However, I'm really struggling a lot to find a place on the internet where I can download workshop extended abstracts from. Does anyone know where one can download the 2021 CVPR workshop short papers? submitted by /u/BigDataOverflow [link] [comments]  ( 1 min )
    [D] Train model to predict continuous variable ranging from 0 to 1 using images as input.
    I have thousands of images of structures that I am trying to use to train a NN to predict their parameters. The inputs are the 2D images of the structure, and the output is the 1 parameter I am trying to predict (I have 2 parameters I am trying to predict but one is fine too). I have tried using the dimensions of the structure to predict the parameters but the error was too high. What do you recommend I do for this? Links are appreciated submitted by /u/ftority [link] [comments]  ( 1 min )
    [D] How do you test (unit, integration) your Machine Learning models/pipelines?
    First things first, do you test them at all? Practices can differ across companies.. Secondly, I believe that testing process can differ based on the model use case (CV, NLP..) but is there any unified set of recommendations, good practices? Can general approaches regarding unit and integration testing practices be applied to ML models/pipelines? What is your approach to this? If you feel like writing maybe you could write in comments: Area of ML that you are talking about: traditional ml algorithms, CV models, NLP models What are you testing? Is it the validity of the forward pass? (example: unit testing the shapes) Is it the validity of the training loop? (somehow) Are you testing if your model's performance is above some threshold? (example: accuracy, F1, bleu, maybe execution time?) Are you testing if your model is making correct predictions on some crucial cases Are you testing important metrics related to your dataset? (example: if it's properly standardized) Some of this things can be tested pretty easily and don't require automated tests, but it is nice to have them. Do you have some other ways to do sanity checks? Please feel free to add other aspects of ML models/pipelines which are worth testing. Looking forward to your insights! submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 2 min )
    [D] What program/software can make animations like this?
    Hi, looking for advice for software/programs that can make animations like this one - thanks! https://upload.wikimedia.org/wikipedia/commons/transcoded/9/92/Infinitely_wide_neural_network.webm/Infinitely_wide_neural_network.webm.360p.vp9.webm submitted by /u/unital [link] [comments]
    [R] A very preliminary analysis of DALL-E 2
    submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [P] The easiest way to process and tag video data
    submitted by /u/happybirthday290 [link] [comments]  ( 3 min )
    [D] How to effectively sample from high dimensional space and create data-efficient training.
    So I have been working on generating data and creating a neural network to predict deformation in meshes, given some mesh parameters like thickness, elasticity, point of force, etc. What I know for sure is these parameters that I am creating a dataset for are/will be in a range and some meshes will have the same deformation for some combined different values of these parameters. What I have tried is uniformly sample from each of these parameters but this seems very data inefficient because there might be some blind spots in the combined sampling, say deformation for a particular combination of parameters can be different and unseen. My questing is how do you efficiently sample from a large multi-dimensional space, is there a better training method that would somehow inform another network to sample efficiently? ​ I will be happy to explain more if this is somehow unclear. submitted by /u/bitemenow999 [link] [comments]  ( 1 min )
    [D] Multiple PDF Files Similarity
    Hi Everyone, I am developing one application in that I have multiple pdf files which user will upload then application will group those PDFs according to their similarity%(% entered by user) if user enters 80% it means documents with atleast 80% similarity will group in batch 1 and so on (similar pdf files will group into one batches i e. Batch 1, batch 2....). PDFs documents can be text or image or collection of both. I am using .Net Core and Angular. I tried Kmean algo but results are different everytime as an Algo is picking random files for centroids. How can i implement this? submitted by /u/ConsciousSlice4329 [link] [comments]  ( 1 min )
  • Open

    Robert Magno (Run:AI) - Building the Best AI Infrastructure Stack to Accelerate Your Data Science
    submitted by /u/Dracutela [link] [comments]
    Weekly China AI News: Beijing Issues First-Ever Driverless Robotaxi Permits; Huawei Anticipates AI Compute to Jump 500 Times by 2030; CogView2 Challenges DALL-E-2 With Better Results
    submitted by /u/trcytony [link] [comments]  ( 1 min )
    How many years away do you think we are from AI that can turn our old games into something that looks like UE5?
    To me it seems to make sense that we would begin relying more and more on AI to improve our old games. For example, ffxiv won't be getting raytracing anytime soon, but I imagine it's only a matter of time before I could download some AI-based image software (like Dall-E) that lets me tweak my gameplay experience to look much more realistic. Do you think I'm right in assuming this is the likely trajectory of AI's use in game graphics? And if the answer to 1 is yes, how far away do you think such technology is? Thank you! submitted by /u/solidwhetstone [link] [comments]  ( 1 min )
    Do you think that AI will ever be as smart as humans (AGI)? If you do think so, when or around what time period?
    Hey guys. I'm working on a project for school and would like to hear your guy's opinions as the leading AI subreddit on whether or not AGI will be achieved, and if so, when. I'd greatly appreciate your input! submitted by /u/SurroundSwimming3494 [link] [comments]  ( 3 min )
    Last Week in AI: AI helps model volcanoes, Anthropic gets $580M for more explainable AI, AI algorithms that screen for child neglect, and more!
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    What's the involvement of AI in Transforming the Media & Entertainment
    Artificial intelligence (AI) will continue to disrupt the media sector, just as it did in 2020 and 2021. AI will most likely fulfill the three critical roles of recommendation, speech recognition, and media automation in this market. Read more: Artificial Intelligence will Continue to Transform the M&E Landscape submitted by /u/JencyJane [link] [comments]  ( 1 min )
    Artificial Intelligence Implications: The Future of Modern Wargaming! (Would love some feedback on a University blog post surrounding Wargaming and the use of AI!)
    submitted by /u/RvZz11 [link] [comments]  ( 1 min )
    5 Best Machine Learning Courses for Beginners, Advanced learn in 2022 -
    submitted by /u/maneesh123456 [link] [comments]
    How to deepfake an audio where you just change the voice but keep what’s been said.
    For example: The scene from Taxi Driver where the protagonist speaks with himself in front of the mirror, how to keep what he’s saying but change his voice for Morgan Freeman’s voice. submitted by /u/Accomplished-Door-61 [link] [comments]  ( 1 min )
    Web Scraping with Python - Learning the Basics | Rubik's Code
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Spoofing detector using YoloV4 Tiny 3L
    submitted by /u/Gloomy_Recognition_4 [link] [comments]  ( 1 min )
    AI2 Open-Sources ‘LM-Debugger’: An Interactive Tool For Inspection And Intervention In Transformer-Based Language Models
    In natural language processing, a language model is a probabilistic statistical model that calculates the likelihood of a specific sequence of words appearing in a phrase based on the preceding words. As a result, it’s common in predictive text input systems, speech recognition, machine translation, and spelling correction, among other applications. They are a method of converting qualitative text information into quantitative data that machines can interpret. Modern NLP models rely on transformer-based language models (LMs). However, a lot more research is to be done under their fundamental prediction development process. Unclear prediction behavior becomes an obstacle for both end-users who don’t comprehend why a model generates certain predictions and developers who want to diagnose or fix model behavior. A new paper published by a group of researchers from Allen Institute for AI, Tel Aviv University, Bar-Ilan University, and the Hebrew University of Jerusalem introduces LM-Debugger, an interactive open-source tool for fine-grained interpretation and intervention in LM predictions. This work will increase the transparency of LMs. Continue Reading Paper: https://arxiv.org/abs/2204.12130 Github: https://github.com/mega002/lm-debugger submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    A new Trading AI?
    Hello Everybody. I have this idea for a trading bot but realised its more in the realm of AI. Im highly uneducated in this field and dont know what im talking about. If possible could someone please answer a few questions. I have found a super successful Day Trader that ive been learning from for a couple months. Hes supposedly "reverse engineered the markets" Ive drawn the trend lines and done the technical analysis hes taught me and its scarily accurate. My only problem is Day Tradimg takes time. So if i could have a bot understand his "science" and do it while im not there it would be a free money hack lol. Q1: How hard is it on a scale of 1-10 to develop/create a Trading AI? Q2: Could someone estimate what a project like this would cost me? Q3: Has a trading AI ever been created? I really appreciate if you read through this and i would really really appreciate answers. Thanks :) submitted by /u/just_conor12 [link] [comments]  ( 3 min )
    Poker AI Plays Itself
    submitted by /u/bluboxsw [link] [comments]
    AI News | AI Powered Robotic Boat Autonomously Cleans Harbors And Rivers | AI Cataract Detection | Machine Learning To Only Propose Molecules Which Are Synthesizable In Lab
    submitted by /u/getrich_or_diemining [link] [comments]  ( 1 min )
    Researchers From MIT and Cornell Develop STEGO (Self-Supervised Transformer With Energy-Based Graph Optimization): A Novel AI Framework That Distills Unsupervised Features Into High-Quality Discrete Semantic Labels
    Unsupervised semantic segmentation seeks to uncover and localize semantically significant categories within image corpora without any annotation. However, there are several challenges in creating annotated training data. These challenges frequently often outweigh semantic segmentation methods’ superior accuracy. Algorithms must develop features for every pixel that are both semantically relevant and compact enough to form discrete clusters to extract meaningful categories with any annotation from the training data. A team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Google, and Cornell University has achieved this by creating a machine learning model named STEGO (Self-supervised Transformer with Energy-based Graph Optimization) that surpasses previous methods by decoupling feature learning from cluster compactification. A frozen backbone makes up STEGO, and it serves as a source of learning feedback and input to the segmentation head for predicting distilled characteristics. This segmentation head is a direct feed-forward network with a ReLU activation function. Unlike earlier studies, the algorithm’s efficiency was increased without retraining or fine-tuning the backbone. The STEGO neural network retrieves global image information by pooling spatial variables in a global average. Then, based on the cosine similarity in the backbone’s feature space, a lookup table is computed for each image’s K-Nearest Neighbours. Continue Reading Paper: https://arxiv.org/pdf/2203.08414.pdf Github: https://github.com/mhamilton723/STEGO submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    What background knowledge is needed to understand traditional exploration methods?
    I can understand most traditional and current methods of reinforcement learning, however when it comes to traditional exploration methods such as UCB or PUCB I am mostly lost. I can understand the algorithms intuitively when explained, but when I go over the papers and look at the proofs/explanations there seems to be some background knowledge I am not privy to. The book Reinforcement Learning: An Introduction doesn't seem to go over these in-depth as well. BTW I have a bachelor's and master's degree in Computer Science and have taken math classes such as statistics, calculus, LA, algorithms, ect. submitted by /u/Stochastic_Machine [link] [comments]  ( 1 min )
    Help needed to understand Rllib attention model parameters
    Hi! I recently started to learn about using attention (transformers). I use rllib implementation (https://docs.ray.io/en/latest/rllib/rllib-models.html#default-model-config-settings) which follows some paper. So far I have some understanding of transformers but I still don't understand everything. When playing with parameters I saw couple which I don't understand how these affect the learning and what they exactly do ( I looked at code and paper but still don't understand). Can somebody explain what are these and how these affect the learning: 1) attention_memory_inference 2) attention_memory_training (also related to previous point: when training why attention_memory_training and input sequence are concated? https://github.com/ray-project/ray/blob/master/rllib/models/torch/attention_net.py#L91 ) Many thanks :D submitted by /u/HBlackwooder [link] [comments]  ( 1 min )
    Is this how a simple parallel environment training works in DDPG/TD3 model?
    I tried to figure out how to do parallel environment training using TD3 or DDPG but just got bits and pieces of information, so this is what I came up with. This is what I have (a Unity "ant" environment): One ant in one environment But I want this: Parallel ants in multiple environments As a result, I can learn faster because I have more data. So here's how I envision parallel training: Agents collect actions from one TD3/DDPG model. Collect observations from all agents and put them into the memory buffer. Do 1 and 2 steps until every agent finishes its episode. If the agent finished its episode ahead of other agents, it simply waits. Then comes the training DDPG/TD3 step. And repeat. Would this type of training even work? submitted by /u/Dougller [link] [comments]  ( 1 min )
    Unity with MLAgents, Isaac Gym, OpenAI Gym and other environments to experiment with reinforcement learning
    Hello, I am a master's student in computer science and I am specializing in artificial intelligence. I am approaching reinforcement learning for the first time in an intelligent robotics course, and would like to experiment with these techniques in a simulated environment. I'm looking for something that is not too complex to learn (I want to focus more on algorithm implementation and simulation rather than spending months figuring out the tool) and that gives enough freedom to modify and implement. Unity seemed interesting because a possible application of these techniques that I would like to explore is in the world of gaming and simulation, but I do not know if a solid knowledge of Unity is required to set up a decent environment before you can get into the experiments. Moreover, I don't know up to what level of detail is possible to enter in the case of MLAgents and how much is possible to customize. Do you recommend one of these that I proposed in the title or also other? Also, do you recommend some material (books, videos, courses, tutorials) to study that is a good compromise between explanation of the tool and theoretical part? Thanks in advance! submitted by /u/Parruck [link] [comments]  ( 2 min )
    creating custom environment for training RL agent
    Hi, I'm working on a project on 2d image reassembly using reinforcement learning. I want to create a custom (puzzle) environment for doing RL. Is there anyway to create a custom environment that can be integrated with gym. If so, how to create one? Thanks. submitted by /u/Praveen_Raja22 [link] [comments]  ( 1 min )
  • Open

    Achieve hyperscale performance for model serving using NVIDIA Triton Inference Server on Amazon SageMaker
    Machine learning (ML) applications are complex to deploy and often require multiple ML models to serve a single inference request. A typical request may flow across multiple models with steps like preprocessing, data transformations, model selection logic, model aggregation, and postprocessing. This has led to the evolution of common design patterns such as serial inference […]  ( 15 min )
    Build a corporate credit ratings classifier using graph machine learning in Amazon SageMaker JumpStart
    Today, we’re releasing a new solution for financial graph machine learning (ML) in Amazon SageMaker JumpStart. JumpStart helps you quickly get started with ML and provides a set of solutions for the most common use cases that can be trained and deployed with just a few clicks. The new JumpStart solution (Graph-Based Credit Scoring) demonstrates […]  ( 8 min )
    Increase your content reach with automated document-to-speech conversion using Amazon AI services
    Reading the printed word opens up a world of information, imagination, and creativity. However, scanned books and documents may be difficult for people with vision impairment and learning disabilities to consume. In addition, some people prefer to listen to text-based content versus reading it. A document-to-speech solution extends the reach of digital content by giving […]  ( 8 min )
  • Open

    Do you usually build neural networks from scratch or do you use a library dedicated to it?
    I´´m curious, I´ve seen both and while the ones using libraries sound harder to make, the ones made from scratch sound slightly sloppier. For context, I mean for predator-prey simulations and such, not image to text recognition or anything that hard. submitted by /u/HesAMagicalPoney55 [link] [comments]  ( 1 min )
    [Question] Interval Branch and Bound
    Hello everyone, I'm studying the Branch and Bound algorithm, and I got a question (Reddit looks like the perfect place to ask other experienced people). I already have a working Branch and Bound algorithm but I want to adapt it to be working with intervals instead of one single value. And how do I do it? ​ On Branch and Bound I have something like: For each sample, I'm calculating the costs and then running the bab algorithm with those and my values. if (x - y < 0) then keep going with the tree else stop with the search But when adapting for Interval Branch and Bound would be something like: if ((x + ∆x) - (y+∆y) < 0) keep going with the tree else stop with the search and then I still need to check for the other bound (x - ∆x). ​ I will need to calculate twice the costs (once for each bound) and search the tree one time also for each cost? I can be saying a really big mistake, but that's why I'm here. Someone can help be clarifying my problem? Thank you for your attention. submitted by /u/Zarathos_PT [link] [comments]  ( 1 min )
    Future Computers Will Be Radically Different
    submitted by /u/keghn [link] [comments]
  • Open

    Why target ads at pregnant women
    I’m listening to a podcast interviewing Neil Richards, the author of Why Privacy Matters. Richards makes a couple interesting points about the infamous example of Target figuring out which women were pregnant based on their purchase history. First, pregnancy is a point at which women are open to trying new things. So if a company […] Why target ads at pregnant women first appeared on John D. Cook.  ( 1 min )
    Curiously simple approximations
    As I’ve written about here and elsewhere, the following simple approximations are fairly accurate. log10 x ≈ (x-1)/(x+1) loge x ≈ 2 (x – 1)/(x + 1) log2 x ≈ 3(x – 1)/(x + 1) It’s a little surprising that each is as accurate as it is, but it’s also surprising that the approximations for […] Curiously simple approximations first appeared on John D. Cook.  ( 1 min )
  • Open

    Mown Away: Startup Rolls Out Autonomous Lawnmower With Cutting-Edge Tech
    Jack Morrison and Isaac Roberts, co-founders of Replica Labs, were restless two years after their 3D vision startup was acquired, seeking another adventure. Then, in 2018, when Morrison was mowing his lawn, it struck him: autonomous lawn mowers. The two, along with Davis Foster, co-founded Scythe Robotics. The company, based in Boulder, Colo., has a Read article > The post Mown Away: Startup Rolls Out Autonomous Lawnmower With Cutting-Edge Tech appeared first on NVIDIA Blog.  ( 3 min )
    Meet the Omnivore: 3D Artist Creates Towering Work With NVIDIA Omniverse
    Edward McEvenue grew up making claymations in LEGO towns. Now, he’s creating photorealistic animations in virtual cities, drawing on more than a decade of experience in the motion graphics industry. The post Meet the Omnivore: 3D Artist Creates Towering Work With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Business Model Transformation: Keys to Monetizing the Edge
    I was conducting a Tech Talk at a client when I mentioned the astronomical growth of data at “the edge”; that data creation at the edge is growing almost as fast as that in the cloud according to IDC. This sent a noticeable ripple across the executive team audience.  The executives immediately began debating “How… Read More »Business Model Transformation: Keys to Monetizing the Edge The post Business Model Transformation: Keys to Monetizing the Edge appeared first on Data Science Central.  ( 5 min )
    Are PDF Documents a Thing of the Past?
    There has been many articles predicting the death of the PDF format, invented in 1993. Some of these articles are 10 years old: you can find them by googling “death of PDF”. With the advent of fluid or liquid layout design on almost every website, widespread Internet browsing on small devices (cell phones), notebooks combining… Read More »Are PDF Documents a Thing of the Past? The post Are PDF Documents a Thing of the Past? appeared first on Data Science Central.  ( 4 min )
  • Open

    10 Ways Technology is Changing Healthcare: How Innovation is Impacting the Medical Industry
    Source: Managed Healthcare Executive  ( 6 min )
  • Open

    Using Kaggle in Machine Learning Projects
    You’ve probably heard of Kaggle data science competitions, but did you know that Kaggle has many other features that can help you with your next machine learning project? For people looking for datasets for their next machine learning project, Kaggle allows you to access public datasets by others and share your own datasets. For those […] The post Using Kaggle in Machine Learning Projects appeared first on Machine Learning Mastery.  ( 7 min )
  • Open

    On the Optimization of Margin Distribution. (arXiv:2204.14118v1 [cs.LG])
    Margin has played an important role on the design and analysis of learning algorithms during the past years, mostly working with the maximization of the minimum margin. Recent years have witnessed the increasing empirical studies on the optimization of margin distribution according to different statistics such as medium margin, average margin, margin variance, etc., whereas there is a relative paucity of theoretical understanding. In this work, we take one step on this direction by providing a new generalization error bound, which is heavily relevant to margin distribution by incorporating ingredients such as average margin and semi-variance, a new margin statistics for the characterization of margin distribution. Inspired by the theoretical findings, we propose the MSVMAv, an efficient approach to achieve better performance by optimizing margin distribution in terms of its empirical average margin and semi-variance. We finally conduct extensive experiments to show the superiority of the proposed MSVMAv approach.  ( 2 min )
    Intuitive Shape Editing in Latent Space. (arXiv:2111.12488v2 [cs.CV] UPDATED)
    The use of autoencoders for shape editing or generation through latent space manipulation suffers from unpredictable changes in the output shape. Our autoencoder-based method enables intuitive shape editing in latent space by disentangling latent sub-spaces into style variables and control points on the surface that can be manipulated independently. The key idea is adding a Lipschitz-type constraint to the loss function, i.e. bounding the change of the output shape proportionally to the change in latent space, leading to interpretable latent space representations. The control points on the surface that are part of the latent code of an object can then be freely moved, allowing for intuitive shape editing directly in latent space. We evaluate our method by comparing to state-of-the-art data-driven shape editing methods. We further demonstrate the expressiveness of our learned latent space by leveraging it for unsupervised part segmentation.  ( 2 min )
    Network Topology Optimization via Deep Reinforcement Learning. (arXiv:2204.14133v1 [cs.NI])
    Topology impacts important network performance metrics, including link utilization, throughput and latency, and is of central importance to network operators. However, due to the combinatorial nature of network topology, it is extremely difficult to obtain an optimal solution, especially since topology planning in networks also often comes with management-specific constraints. As a result, local optimization with hand-tuned heuristic methods from human experts are often adopted in practice. Yet, heuristic methods cannot cover the global topology design space while taking into account constraints, and cannot guarantee to find good solutions. In this paper, we propose a novel deep reinforcement learning (DRL) algorithm, called Advantage Actor Critic-Graph Searching (A2C-GS), for network topology optimization. A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search. A2C-GS can efficiently search over large topology space and output topology with satisfying performance. We conduct a case study based on a real network scenario, and our experimental results demonstrate the superior performance of A2C-GS in terms of both efficiency and performance.  ( 2 min )
    Goldilocks-curriculum Domain Randomization and Fractal Perlin Noise with Application to Sim2Real Pneumonia Lesion Detection. (arXiv:2204.13849v1 [cs.CV])
    A computer-aided detection (CAD) system based on machine learning is expected to assist radiologists in making a diagnosis. It is desirable to build CAD systems for the various types of diseases accumulating daily in a hospital. An obstacle in developing a CAD system for a disease is that the number of medical images is typically too small to improve the performance of the machine learning model. In this paper, we aim to explore ways to address this problem through a sim2real transfer approach in medical image fields. To build a platform to evaluate the performance of sim2real transfer methods in the field of medical imaging, we construct a benchmark dataset that consists of $101$ chest X-images with difficult-to-identify pneumonia lesions judged by an experienced radiologist and a simulator based on fractal Perlin noise and the X-ray principle for generating pseudo pneumonia lesions. We then develop a novel domain randomization method, called Goldilocks-curriculum domain randomization (GDR) and evaluate our method in this platform.  ( 2 min )
    A review of Federated Learning in Intrusion Detection Systems for IoT. (arXiv:2204.12443v2 [cs.CR] UPDATED)
    Intrusion detection systems are evolving into intelligent systems that perform data analysis searching for anomalies in their environment. The development of deep learning technologies opened the door to build more complex and effective threat detection models. However, training those models may be computationally infeasible in most Internet of Things devices. Current approaches rely on powerful centralized servers that receive data from all their parties -- violating basic privacy constraints and substantially affecting response times and operational costs due to the huge communication overheads. To mitigate these issues, Federated Learning emerged as a promising approach where different agents collaboratively train a shared model, neither exposing training data to others nor requiring a compute-intensive centralized infrastructure. This paper focuses on the application of Federated Learning approaches in the field of Intrusion Detection. Both technologies are described in detail and current scientific progress is reviewed and categorized. Finally, the paper highlights the limitations present in recent works and presents some future directions for this technology.
    Forecasting large-scale circulation regimes using deformable convolutional neural networks and global spatiotemporal climate data. (arXiv:2202.04964v2 [cs.LG] UPDATED)
    Classifying the state of the atmosphere into a finite number of large-scale circulation regimes is a popular way of investigating teleconnections, the predictability of severe weather events, and climate change. Here, we investigate a supervised machine learning approach based on deformable convolutional neural networks (deCNNs) and transfer learning to forecast the North Atlantic-European weather regimes during extended boreal winter for 1 to 15 days into the future. We apply state-of-the-art interpretation techniques from the machine learning literature to attribute particular regions of interest or potential teleconnections relevant for any given weather cluster prediction or regime transition. We demonstrate superior forecasting performance relative to several classical meteorological benchmarks, as well as logistic regression and random forests. Due to its wider field of view, we also observe deCNN achieving considerably better performance than regular convolutional neural networks at lead times beyond 5-6 days. Finally, we find transfer learning to be of paramount importance, similar to previous data-driven atmospheric forecasting studies.
    SleepPPG-Net: a deep learning algorithm for robust sleep staging from continuous photoplethysmography. (arXiv:2202.05735v4 [cs.LG] UPDATED)
    Introduction: Sleep staging is an essential component in the diagnosis of sleep disorders and management of sleep health. It is traditionally measured in a clinical setting and requires a labor-intensive labeling process. We hypothesize that it is possible to perform robust 4-class sleep staging using the raw photoplethysmography (PPG) time series and modern advances in deep learning (DL). Methods: We used two publicly available sleep databases that included raw PPG recordings, totalling 2,374 patients and 23,055 hours. We developed SleepPPG-Net, a DL model for 4-class sleep staging from the raw PPG time series. SleepPPG-Net was trained end-to-end and consists of a residual convolutional network for automatic feature extraction and a temporal convolutional network to capture long-range contextual information. We benchmarked the performance of SleepPPG-Net against models based on the best-reported state-of-the-art (SOTA) algorithms. Results: When benchmarked on a held-out test set, SleepPPG-Net obtained a median Cohen's Kappa ($\kappa$) score of 0.75 against 0.69 for the best SOTA approach. SleepPPG-Net showed good generalization performance to an external database, obtaining a $\kappa$ score of 0.74 after transfer learning. Perspective: Overall, SleepPPG-Net provides new SOTA performance. In addition, performance is high enough to open the path to the development of wearables that meet the requirements for usage in clinical applications such as the diagnosis and monitoring of obstructive sleep apnea.
    Exploration and Exploitation in Federated Learning to Exclude Clients with Poisoned Data. (arXiv:2204.14020v1 [cs.DC])
    Federated Learning (FL) is one of the hot research topics, and it utilizes Machine Learning (ML) in a distributed manner without directly accessing private data on clients. However, FL faces many challenges, including the difficulty to obtain high accuracy, high communication cost between clients and the server, and security attacks related to adversarial ML. To tackle these three challenges, we propose an FL algorithm inspired by evolutionary techniques. The proposed algorithm groups clients randomly in many clusters, each with a model selected randomly to explore the performance of different models. The clusters are then trained in a repetitive process where the worst performing cluster is removed in each iteration until one cluster remains. In each iteration, some clients are expelled from clusters either due to using poisoned data or low performance. The surviving clients are exploited in the next iteration. The remaining cluster with surviving clients is then used for training the best FL model (i.e., remaining FL model). Communication cost is reduced since fewer clients are used in the final training of the FL model. To evaluate the performance of the proposed algorithm, we conduct a number of experiments using FEMNIST dataset and compare the result against the random FL algorithm. The experimental results show that the proposed algorithm outperforms the baseline algorithm in terms of accuracy, communication cost, and security.
    The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs. (arXiv:2110.07409v3 [math.OC] UPDATED)
    We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we estimate the number of critical points and use the polynomial programming description of reward maximization to solve a navigation problem in a grid world.
    Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization. (arXiv:2203.13167v3 [cs.CV] UPDATED)
    In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation of established continual learning regularization techniques. We then examine the effect of regularization when applied to two key enablers of SAM: (a) the contextualized embedding layers, for their ability to capture well-scaled representations with respect to the values, and (b) the prescaled attention maps, for carrying value-independent global contextual information. We depict the perks of each distilling strategy on two image recognition benchmarks (CIFAR100 and ImageNet-32) -- while (a) leads to a better overall accuracy, (b) helps enhance the rigidity by maintaining competitive performances. Furthermore, we identify the limitation imposed by the symmetric nature of regularization losses. To alleviate this, we propose an asymmetric variant and apply it to the pooled output distillation (POD) loss adapted for ViTs. Our experiments confirm that introducing asymmetry to POD boosts its plasticity while retaining stability across (a) and (b). Moreover, we acknowledge low forgetting measures for all the compared methods, indicating that ViTs might be naturally inclined continual learner
    Fiber Bundle Morphisms as a Framework for Modeling Many-to-Many Maps. (arXiv:2203.08189v2 [cs.LG] UPDATED)
    While it is not generally reflected in the `nice' datasets used for benchmarking machine learning algorithms, the real-world is full of processes that would be best described as many-to-many. That is, a single input can potentially yield many different outputs (whether due to noise, imperfect measurement, or intrinsic stochasticity in the process) and many different inputs can yield the same output (that is, the map is not injective). For example, imagine a sentiment analysis task where, due to linguistic ambiguity, a single statement can have a range of different sentiment interpretations while at the same time many distinct statements can represent the same sentiment. When modeling such a multivalued function $f: X \rightarrow Y$, it is frequently useful to be able to model the distribution on $f(x)$ for specific input $x$ as well as the distribution on fiber $f^{-1}(y)$ for specific output $y$. Such an analysis helps the user (i) better understand the variance intrinsic to the process they are studying and (ii) understand the range of specific input $x$ that can be used to achieve output $y$. Following existing work which used a fiber bundle framework to better model many-to-one processes, we describe how morphisms of fiber bundles provide a template for building models which naturally capture the structure of many-to-many processes.  ( 2 min )
    Reducing Neural Architecture Search Spaces with Training-Free Statistics and Computational Graph Clustering. (arXiv:2204.14103v1 [cs.LG])
    The computational demands of neural architecture search (NAS) algorithms are usually directly proportional to the size of their target search spaces. Thus, limiting the search to high-quality subsets can greatly reduce the computational load of NAS algorithms. In this paper, we present Clustering-Based REDuction (C-BRED), a new technique to reduce the size of NAS search spaces. C-BRED reduces a NAS space by clustering the computational graphs associated with its architectures and selecting the most promising cluster using proxy statistics correlated with network accuracy. When considering the NAS-Bench-201 (NB201) data set and the CIFAR-100 task, C-BRED selects a subset with 70% average accuracy instead of the whole space's 64% average accuracy.  ( 2 min )
    Abstraction for Deep Reinforcement Learning. (arXiv:2202.05839v3 [cs.LG] UPDATED)
    We characterise the problem of abstraction in the context of deep reinforcement learning. Various well established approaches to analogical reasoning and associative memory might be brought to bear on this issue, but they present difficulties because of the need for end-to-end differentiability. We review developments in AI and machine learning that could facilitate their adoption.
    Unsolved Problems in ML Safety. (arXiv:2109.13916v4 [cs.LG] UPDATED)
    Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.
    KnowAugNet: Multi-Source Medical Knowledge Augmented Medication Prediction Network with Multi-Level Graph Contrastive Learning. (arXiv:2204.11736v2 [cs.AI] UPDATED)
    Predicting medications is a crucial task in many intelligent healthcare systems. It can assist doctors in making informed medication decisions for patients according to electronic medical records (EMRs). However, medication prediction is a challenging data mining task due to the complex relations between medical codes. Most existing studies focus on utilizing inherent relations between homogeneous codes of medical ontology graph to enhance their representations using supervised methods, and few studies pay attention to the valuable relations between heterogeneous or homogeneous medical codes from history EMRs, which further limits the prediction performance and application scenarios. Therefore, to address these limitations, this paper proposes KnowAugNet, a multi-sourced medical knowledge augmented medication prediction network which can fully capture the diverse relations between medical codes via multi-level graph contrastive learning framework. Specifically, KnowAugNet first leverages the graph contrastive learning using graph attention network as the encoder to capture the implicit relations between homogeneous medical codes from the medical ontology graph and obtains the knowledge augmented medical codes embedding vectors. Then, it utilizes the graph contrastive learning using a weighted graph convolutional network as the encoder to capture the correlative relations between homogeneous or heterogeneous medical codes from the constructed medical prior relation graph and obtains the relation augmented medical codes embedding vectors. Finally, the augmented medical codes embedding vectors and the supervised medical codes embedding vectors are retrieved and input to the sequential learning network to capture the temporal relations of medical codes and predict medications for patients.  ( 2 min )
    Rockafellian Relaxation in Optimization under Uncertainty: Asymptotically Exact Formulations. (arXiv:2204.04762v2 [math.OC] UPDATED)
    In practice, optimization models are often prone to unavoidable inaccuracies due to lack of data and dubious assumptions. Traditionally, this placed special emphasis on risk-based and robust formulations, and their focus on "conservative" decisions. We develop, in contrast, an "optimistic" framework based on Rockafellian relaxations in which optimization is conducted not only over the original decision space but also jointly with a choice of model perturbation. The framework enables us to address challenging problems with ambiguous probability distributions from the areas of two-stage stochastic optimization without relatively complete recourse, probability functions lacking continuity properties, expectation constraints, and outlier analysis. We are also able to circumvent the fundamental difficulty in stochastic optimization that convergence of distributions fails to guarantee convergence of expectations. The framework centers on the novel concepts of exact and asymptotically exact Rockafellians, with interpretations of "negative" regularization emerging in certain settings. We illustrate the role of Phi-divergence, examine rates of convergence under changing distributions, and explore extensions to first-order optimality conditions. The main development is free of assumptions about convexity, smoothness, and even continuity of objective functions.
    RAMP-CNN: A Novel Neural Network for Enhanced Automotive Radar Object Recognition. (arXiv:2011.08981v2 [eess.SP] UPDATED)
    Millimeter-wave radars are being increasingly integrated into commercial vehicles to support new advanced driver-assistance systems by enabling robust and high-performance object detection, localization, as well as recognition - a key component of new environmental perception. In this paper, we propose a novel radar multiple-perspectives convolutional neural network (RAMP-CNN) that extracts the location and class of objects based on further processing of the range-velocity-angle (RVA) heatmap sequences. To bypass the complexity of 4D convolutional neural networks (NN), we propose to combine several lower-dimension NN models within our RAMP-CNN model that nonetheless approaches the performance upper-bound with lower complexity. The extensive experiments show that the proposed RAMP-CNN model achieves better average recall and average precision than prior works in all testing scenarios. Besides, the RAMP-CNN model is validated to work robustly under nighttime, which enables low-cost radars as a potential substitute for pure optical sensing under severe conditions.  ( 2 min )
    Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation. (arXiv:2202.02440v2 [cs.CV] UPDATED)
    In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated whenever there is a change in the task type or the goal modality. We present a unified approach to visual navigation using a novel modular transfer learning model. Our model can effectively leverage its experience from one source task and apply it to multiple target tasks (e.g., ObjectNav, RoomNav, ViewNav) with various goal modalities (e.g., image, sketch, audio, label). Furthermore, our model enables zero-shot experience learning, whereby it can solve the target tasks without receiving any task-specific interactive training. Our experiments on multiple photorealistic datasets and challenging tasks show that our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
    Human's Role in-the-Loop. (arXiv:2204.14192v1 [cs.DB])
    Data integration has been recently challenged by the need to handle large volumes of data, arriving at high velocity from a variety of sources, which demonstrate varying levels of veracity. This challenging setting, often referred to as big data, renders many of the existing techniques, especially those that are human-intensive, obsolete. Big data also produces technological advancements such as Internet of things, cloud computing, and deep learning, and accordingly, provides a new, exciting, and challenging research agenda. Given the availability of data and the improvement of machine learning techniques, this blog discusses the respective roles of humans and machines in achieving cognitive tasks in matching, aiming to determine whether traditional roles of humans and machines are subject to change. Such investigation, we believe, will pave a way to better utilize both human and machine resources in new and innovative manners. We shall discuss two possible modes of change, namely humans out and humans in. Humans out aim at exploring out-of-the-box latent matching reasoning using machine learning algorithms when attempting to overpower human matcher performance. Pursuing out-of-the-box thinking, machine and deep learning can be involved in matching. Humans in explores how to better involve humans in the matching loop by assigning human matchers with a symmetric role to algorithmic matcher in the matching process.  ( 2 min )
    Learning Anisotropic Interaction Rules from Individual Trajectories in a Heterogeneous Cellular Population. (arXiv:2204.14141v1 [q-bio.QM])
    Interacting particle system (IPS) models have proven to be highly successful for describing the spatial movement of organisms. However, it has proven challenging to infer the interaction rules directly from data. In the field of equation discovery, the Weak form Sparse Identification of Nonlinear Dynamics (WSINDy) methodology has been shown to be very computationally efficient for identifying the governing equations of complex systems, even in the presence of substantial noise. Motivated by the success of IPS models to describe the spatial movement of organisms, we develop WSINDy for second order IPSs to model the movement of communities of cells. Specifically, our approach learns the directional interaction rules that govern the dynamics of a heterogeneous population of migrating cells. Rather than aggregating cellular trajectory data into a single best-fit model, we learn the models for each individual cell. These models can then be efficiently classified according to the active classes of interactions present in the model. From these classifications, aggregated models are constructed hierarchically to simultaneously identify different species of cells present in the population and determine best-fit models for each species. We demonstrate the efficiency and proficiency of the method on several test scenarios, motivated by common cell migration experiments.
    Analysing the Influence of Attack Configurations on the Reconstruction of Medical Images in Federated Learning. (arXiv:2204.13808v1 [eess.IV])
    The idea of federated learning is to train deep neural network models collaboratively and share them with multiple participants without exposing their private training data to each other. This is highly attractive in the medical domain due to patients' privacy records. However, a recently proposed method called Deep Leakage from Gradients enables attackers to reconstruct data from shared gradients. This study shows how easy it is to reconstruct images for different data initialization schemes and distance measures. We show how data and model architecture influence the optimal choice of initialization scheme and distance measure configurations when working with single images. We demonstrate that the choice of initialization scheme and distance measure can significantly increase convergence speed and quality. Furthermore, we find that the optimal attack configuration depends largely on the nature of the target image distribution and the complexity of the model architecture.
    Pre-training helps Bayesian optimization too. (arXiv:2109.08215v3 [cs.LG] UPDATED)
    Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. Theoretically, we show a bounded regret of BO with pre-trained priors. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.
    Data+Shift: Supporting visual investigation of data distribution shifts by data scientists. (arXiv:2204.14025v1 [cs.LG])
    Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.
    H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness. (arXiv:2204.13852v1 [cs.LG])
    The complex nature of real-world problems calls for heterogeneity in both machine learning (ML) models and hardware systems. The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i.e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns. The heterogeneity in systems comes from diverse processing components, as it becomes the prevailing method to integrate multiple dedicated accelerators into one system. Therefore, a new problem emerges: heterogeneous model to heterogeneous system mapping (H2H). While previous mapping algorithms mostly focus on efficient computations, in this work, we argue that it is indispensable to consider computation and communication simultaneously for better system efficiency. We propose a novel H2H mapping algorithm with both computation and communication awareness; by slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced. The superior performance of our work is evaluated based on MAESTRO modeling, demonstrating 15%-74% latency reduction and 23%-64% energy reduction compared with existing computation-prioritized mapping algorithms.  ( 2 min )
    Controlled Generation of Unseen Faults for Partial and OpenSet&Partial Domain Adaptation. (arXiv:2204.14068v1 [cs.LG])
    New operating conditions can result in a performance drop of fault diagnostics models due to the domain gap between the training and the testing data distributions. While several domain adaptation approaches have been proposed to overcome such domain shifts, their application is limited if the label spaces of the two domains are not congruent. To improve the transferability of the trained models, particularly in setups where only the healthy data class is shared between the two domains, we propose a new framework based on a Wasserstein GAN for Partial and OpenSet&Partial domain adaptation. The main contribution is the controlled fault data generation that enables to generate unobserved fault types and severity levels in the target domain by having only access to the healthy samples in the target domain and faulty samples in the source domain. To evaluate the ability of the proposed method to bridge domain gaps in different domain adaption settings, we conduct Partial as well as OpenSet&Partial domain adaptation experiments on two bearing fault diagnostics case studies. The results show the versatility of the framework and that the synthetically generated fault data helps bridging the domain gaps, especially in instances where the domain gap is large.
    Towards Generalizable Semantic Product Search by Text Similarity Pre-training on Search Click Logs. (arXiv:2204.05231v2 [cs.IR] UPDATED)
    Recently, semantic search has been successfully applied to e-commerce product search and the learned semantic space(s) for query and product encoding are expected to generalize to unseen queries or products. Yet, whether generalization can conveniently emerge has not been thoroughly studied in the domain thus far. In this paper, we examine several general-domain and domain-specific pre-trained Roberta variants and discover that general-domain fine-tuning does not help generalization, which aligns with the discovery of prior art. Proper domain-specific fine-tuning with clickstream data can lead to better model generalization, based on a bucketed analysis of a publicly available manual annotated query-product pair da  ( 2 min )
    Recommendations on test datasets for evaluating AI solutions in pathology. (arXiv:2204.14226v1 [eess.IV])
    Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations for the collection of test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help regulatory agencies and end users verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.  ( 2 min )
    Semi-Discrete Optimal Transport: Hardness, Regularization and Numerical Solution. (arXiv:2103.06263v2 [cs.LG] UPDATED)
    Semi-discrete optimal transport problems, which evaluate the Wasserstein distance between a discrete and a generic (possibly non-discrete) probability measure, are believed to be computationally hard. Even though such problems are ubiquitous in statistics, machine learning and computer vision, however, this perception has not yet received a theoretical justification. To fill this gap, we prove that computing the Wasserstein distance between a discrete probability measure supported on two points and the Lebesgue measure on the standard hypercube is already #P-hard. This insight prompts us to seek approximate solutions for semi-discrete optimal transport problems. We thus perturb the underlying transportation cost with an additive disturbance governed by an ambiguous probability distribution, and we introduce a distributionally robust dual optimal transport problem whose objective function is smoothed with the most adverse disturbance distributions from within a given ambiguity set. We further show that smoothing the dual objective function is equivalent to regularizing the primal objective function, and we identify several ambiguity sets that give rise to several known and new regularization schemes. As a byproduct, we discover an intimate relation between semi-discrete optimal transport problems and discrete choice models traditionally studied in psychology and economics. To solve the regularized optimal transport problems efficiently, we use a stochastic gradient descent algorithm with imprecise stochastic gradient oracles. A new convergence analysis reveals that this algorithm improves the best known convergence guarantee for semi-discrete optimal transport problems with entropic regularizers.
    Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection. (arXiv:2111.07819v3 [cs.CL] UPDATED)
    A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods predominantly target specific content types (e.g., news) or platforms (e.g., Twitter). The methods' capabilities to generalize were largely unclear so far. We evaluate fifteen Transformer-based models on five COVID-19 misinformation datasets that include social media posts, news articles, and scientific papers to fill this gap. We show tokenizers and models tailored to COVID-19 data do not provide a significant advantage over general-purpose ones. Our study provides a realistic assessment of models for detecting COVID-19 misinformation. We expect that evaluating a broad spectrum of datasets and models will benefit future research in developing misinformation detection systems.
    To Trust or Not To Trust Prediction Scores for Membership Inference Attacks. (arXiv:2111.09076v2 [cs.LG] UPDATED)
    Membership inference attacks (MIAs) aim to determine whether a specific sample was used to train a predictive model. Knowing this may indeed lead to a privacy breach. Most MIAs, however, make use of the model's prediction scores - the probability of each output given some input - following the intuition that the trained model tends to behave differently on its training data. We argue that this is a fallacy for many modern deep network architectures. Consequently, MIAs will miserably fail since overconfidence leads to high false-positive rates not only on known domains but also on out-of-distribution data and implicitly acts as a defense against MIAs. Specifically, using generative adversarial networks, we are able to produce a potentially infinite number of samples falsely classified as part of the training data. In other words, the threat of MIAs is overestimated, and less information is leaked than previously assumed. Moreover, there is actually a trade-off between the overconfidence of models and their susceptibility to MIAs: the more classifiers know when they do not know, making low confidence predictions, the more they reveal the training data.
    Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain. (arXiv:2204.13883v1 [eess.AS])
    The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and labour-intensive. Furthermore, the resulting static choices of masker and gain are often inflexible to the dynamic nature of real-world soundscapes. In this work, we utilized a deep learning model to perform joint selection of the optimal masker and its gain level for a given soundscape. The proposed model was designed with highly modular building blocks, allowing for an optimized inference process that can quickly search through a large number of masker and gain combinations. In addition, we introduced the use of feature-domain soundscape augmentation conditioned on the digital gain level, eliminating the computationally expensive waveform-domain mixing process during inference time, as well as the tedious pre-calibration process required for new maskers. The proposed system was validated on a large-scale dataset of subjective responses to augmented soundscapes with more than 440 participants, ensuring the ability of the model to predict combined effect of the masker and its gain level on the perceptual pleasantness level.  ( 2 min )
    Wide and Deep Neural Networks Achieve Optimality for Classification. (arXiv:2204.14126v1 [cs.LG])
    While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are optimal for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that achieve optimality. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and Neural Tangent Kernels, we provide explicit activation functions that can be used to construct networks that achieve optimality. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: (1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); (2) majority vote (model predictions are given by the label of the class with greatest representation in the training set); or (3) singular kernel classifiers (a set of classifiers containing those that achieve optimality). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
    A Framework for Constructing Machine Learning Models with Feature Set Optimisation for Evapotranspiration Partitioning. (arXiv:2204.14142v1 [cs.LG])
    A deeper understanding of the drivers of evapotranspiration and the modelling of its constituent parts (evaporation and transpiration) could be of significant importance to the monitoring and management of water resources globally over the coming decades. In this work, we developed a framework to identify the best performing machine learning algorithm from a candidate set, select optimal predictive features as well as ranking features in terms of their im- portance to predictive accuracy. Our experiments used 3 separate feature sets across 4 wetland sites as input into 8 candidate machine learning algorithms, providing 96 sets of experimental configurations. Given this high number of parameters, our results show strong evidence that there is no singularly optimal machine learning algorithm or feature set across all of the wetland sites studied despite their similarities. A key finding discovered when examining feature importance is that methane flux, a feature whose relationship with evapotranspiration is not generally examined, may contribute to further biophysical process understanding.
    Adversarially-regularized mixed effects deep learning (ARMED) models for improved interpretability, performance, and generalization on clustered data. (arXiv:2202.11783v3 [cs.LG] UPDATED)
    Natural science datasets frequently violate assumptions of independence. Samples may be clustered (e.g. by study site, subject, or experimental batch), leading to spurious associations, poor model fitting, and confounded analyses. While largely unaddressed in deep learning, this problem has been handled in the statistics community through mixed effects models, which separate cluster-invariant fixed effects from cluster-specific random effects. We propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED) models through non-intrusive additions to existing neural networks: 1) an adversarial classifier constraining the original model to learn only cluster-invariant features, 2) a random effects subnetwork capturing cluster-specific features, and 3) an approach to apply random effects to clusters unseen during training. We apply ARMED to dense, convolutional, and autoencoder neural networks on 4 applications including simulated nonlinear data, dementia prognosis and diagnosis, and live-cell image analysis. Compared to prior techniques, ARMED models better distinguish confounded from true associations in simulations and learn more biologically plausible features in clinical applications. They can also quantify inter-cluster variance and visualize cluster effects in data. Finally, ARMED improves accuracy on data from clusters seen during training (up to 28% vs. conventional models) and generalization to unseen clusters (up to 9% vs. conventional models).
    COVID-Net US-X: Enhanced Deep Neural Network for Detection of COVID-19 Patient Cases from Convex Ultrasound Imaging Through Extended Linear-Convex Ultrasound Augmentation Learning. (arXiv:2204.13851v1 [eess.IV])
    As the global population continues to face significant negative impact by the on-going COVID-19 pandemic, there has been an increasing usage of point-of-care ultrasound (POCUS) imaging as a low-cost and effective imaging modality of choice in the COVID-19 clinical workflow. A major barrier with widespread adoption of POCUS in the COVID-19 clinical workflow is the scarcity of expert clinicians that can interpret POCUS examinations, leading to considerable interest in deep learning-driven clinical decision support systems to tackle this challenge. A major challenge to building deep neural networks for COVID-19 screening using POCUS is the heterogeneity in the types of probes used to capture ultrasound images (e.g., convex vs. linear probes), which can lead to very different visual appearances. In this study, we explore the impact of leveraging extended linear-convex ultrasound augmentation learning on producing enhanced deep neural networks for COVID-19 assessment, where we conduct data augmentation on convex probe data alongside linear probe data that have been transformed to better resemble convex probe data. Experimental results using an efficient deep columnar anti-aliased convolutional neural network designed via a machined-driven design exploration strategy (which we name COVID-Net US-X) show that the proposed extended linear-convex ultrasound augmentation learning significantly increases performance, with a gain of 5.1% in test accuracy and 13.6% in AUC.
    PokeBNN: A Binary Pursuit of Lightweight Accuracy. (arXiv:2112.00133v2 [cs.LG] UPDATED)
    Optimization of Top-1 ImageNet promotes enormous networks that may be impractical in inference settings. Binary neural networks (BNNs) have the potential to significantly lower the compute intensity but existing models suffer from low quality. To overcome this deficiency, we propose PokeConv, a binary convolution block which improves quality of BNNs by techniques such as adding multiple residual paths, and tuning the activation function. We apply it to ResNet-50 and optimize ResNet's initial convolutional layer which is hard to binarize. We name the resulting network family PokeBNN. These techniques are chosen to yield favorable improvements in both top-1 accuracy and the network's cost. In order to enable joint optimization of the cost together with accuracy, we define arithmetic computation effort (ACE), a hardware- and energy-inspired cost metric for quantized and binarized networks. We also identify a need to optimize an under-explored hyper-parameter controlling the binarization gradient approximation. We establish a new, strong state-of-the-art (SOTA) on top-1 accuracy together with commonly-used CPU64 cost, ACE cost and network size metrics. ReActNet-Adam, the previous SOTA in BNNs, achieved a 70.5% top-1 accuracy with 7.9 ACE. A small variant of PokeBNN achieves 70.5% top-1 with 2.6 ACE, more than 3x reduction in cost; a larger PokeBNN achieves 75.6% top-1 with 7.8 ACE, more than 5% improvement in accuracy without increasing the cost. PokeBNN implementation in JAX/Flax and reproduction instructions are available in AQT repository: https://github.com/google/aqt
    Altruist: Argumentative Explanations through Local Interpretations of Predictive Models. (arXiv:2010.07650v2 [cs.LG] UPDATED)
    Explainable AI is an emerging field providing solutions for acquiring insights into automated systems' rationale. It has been put on the AI map by suggesting ways to tackle key ethical and societal issues. Existing explanation techniques are often not comprehensible to the end user. Lack of evaluation and selection criteria also makes it difficult for the end user to choose the most suitable technique. In this study, we combine logic-based argumentation with Interpretable Machine Learning, introducing a preliminary meta-explanation methodology that identifies the truthful parts of feature importance oriented interpretations. This approach, in addition to being used as a meta-explanation technique, can be used as an evaluation or selection tool for multiple feature importance techniques. Experimentation strongly indicates that an ensemble of multiple interpretation techniques yields considerably more truthful explanations.
    Identifying Critical LMS Features for Predicting At-risk Students. (arXiv:2204.13700v1 [cs.LG])
    Learning management systems (LMSs) have become essential in higher education and play an important role in helping educational institutions to promote student success. Traditionally, LMSs have been used by postsecondary institutions in administration, reporting, and delivery of educational content. In this paper, we present an additional use of LMS by using its data logs to perform data-analytics and identify academically at-risk students. The data-driven insights would allow educational institutions and educators to develop and implement pedagogical interventions targeting academically at-risk students. We used anonymized data logs created by Brightspace LMS during fall 2019, spring 2020, and fall 2020 semesters at our college. Supervised machine learning algorithms were used to predict the final course performance of students, and several algorithms were found to perform well with accuracy above 90%. SHAP value method was used to assess the relative importance of features used in the predictive models. Unsupervised learning was also used to group students into different clusters based on the similarities in their interaction/involvement with LMS. In both of supervised and unsupervised learning, we identified two most-important features (Number_Of_Assignment_Submissions and Content_Completed). More importantly, our study lays a foundation and provides a framework for developing a real-time data analytics metric that may be incorporated into a LMS.  ( 2 min )
    Multimodal Transformer-based Model for Buchwald-Hartwig and Suzuki-Miyaura Reaction Yield Prediction. (arXiv:2204.14062v1 [cs.LG])
    Predicting the yield percentage of a chemical reaction is useful in many aspects such as reducing wet-lab experimentation by giving the priority to the reactions with a high predicted yield. In this work we investigated the use of multiple type inputs to predict chemical reaction yield. We used simplified molecular-input line-entry system (SMILES) as well as calculated chemical descriptors as model inputs. The model consists of a pre-trained bidirectional transformer-based encoder (BERT) and a multi-layer perceptron (MLP) with a regression head to predict the yield. We experimented on two high throughput experimentation (HTE) datasets for Buchwald-Hartwig and Suzuki-Miyaura reactions. The experiments show improvements in the prediction on both datasets compared to systems using only SMILES or chemical descriptors as input. We also tested the model's performance on out-of-sample dataset splits of Buchwald-Hartwig and achieved comparable results with the state-of-the-art. In addition to predicting the yield, we demonstrated the model's ability to suggest the optimum (highest yield) reaction conditions. The model was able to suggest conditions that achieves 94% of the optimum reported yields. This proves the model to be useful in achieving the best results in the wet lab without expensive experimentation.
    Few-shot learning for medical text: A systematic review. (arXiv:2204.14081v1 [cs.CL])
    Objective: Few-shot learning (FSL) methods require small numbers of labeled instances for training. As many medical topics have limited annotated textual data in practical settings, FSL-based natural language processing (NLP) methods hold substantial promise. We aimed to conduct a systematic review to explore the state of FSL methods for medical NLP. Materials and Methods: We searched for articles published between January 2016 and August 2021 using PubMed/Medline, Embase, ACL Anthology, and IEEE Xplore Digital Library. To identify the latest relevant methods, we also searched other sources such as preprint servers (eg., medRxiv) via Google Scholar. We included all articles that involved FSL and any type of medical text. We abstracted articles based on data source(s), aim(s), training set size(s), primary method(s)/approach(es), and evaluation method(s). Results: 31 studies met our inclusion criteria-all published after 2018; 22 (71%) since 2020. Concept extraction/named entity recognition was the most frequently addressed task (13/31; 42%), followed by text classification (10/31; 32%). Twenty-one (68%) studies reconstructed existing datasets to create few-shot scenarios synthetically, and MIMIC-III was the most frequently used dataset (7/31; 23%). Common methods included FSL with attention mechanisms (12/31; 39%), prototypical networks (8/31; 26%), and meta-learning (6/31; 19%). Discussion: Despite the potential for FSL in biomedical NLP, progress has been limited compared to domain-independent FSL. This may be due to the paucity of standardized, public datasets, and the relative underperformance of FSL methods on biomedical topics. Creation and release of specialized datasets for biomedical FSL may aid method development by enabling comparative analyses.
    MarkovGNN: Graph Neural Networks on Markov Diffusion. (arXiv:2202.02470v2 [cs.LG] UPDATED)
    Most real-world networks contain well-defined community structures where nodes are densely connected internally within communities. To learn from these networks, we develop MarkovGNN that captures the formation and evolution of communities directly in different convolutional layers. Unlike most Graph Neural Networks (GNNs) that consider a static graph at every layer, MarkovGNN generates different stochastic matrices using a Markov process and then uses these community-capturing matrices in different layers. MarkovGNN is a general approach that could be used with most existing GNNs. We experimentally show that MarkovGNN outperforms other GNNs for clustering, node classification, and visualization tasks. The source code of MarkovGNN is publicly available at \url{https://github.com/HipGraph/MarkovGNN}.
    GCN-FFNN: A Two-Stream Deep Model for Learning Solution to Partial Differential Equations. (arXiv:2204.13744v1 [cs.LG])
    This paper introduces a novel two-stream deep model based on graph convolutional network (GCN) architecture and feed-forward neural networks (FFNN) for learning the solution of nonlinear partial differential equations (PDEs). The model aims at incorporating both graph and grid input representations using two streams corresponding to GCN and FFNN models, respectively. Each stream layer receives and processes its own input representation. As opposed to FFNN which receives a grid-like structure, the GCN stream layer operates on graph input data where the neighborhood information is incorporated through the adjacency matrix of the graph. In this way, the proposed GCN-FFNN model learns from two types of input representations, i.e. grid and graph data, obtained via the discretization of the PDE domain. The GCN-FFNN model is trained in two phases. In the first phase, the model parameters of each stream are trained separately. Both streams employ the same error function to adjust their parameters by enforcing the models to satisfy the given PDE as well as its initial and boundary conditions on grid or graph collocation (training) data. In the second phase, the learned parameters of two-stream layers are frozen and their learned representation solutions are fed to fully connected layers whose parameters are learned using the previously used error function. The learned GCN-FFNN model is tested on test data located both inside and outside the PDE domain. The obtained numerical results demonstrate the applicability and efficiency of the proposed GCN-FFNN model over individual GCN and FFNN models on 1D-Burgers, 1D-Schr\"odinger, 2D-Burgers and 2D-Schr\"odinger equations.
    ROS-X-Habitat: Bridging the ROS Ecosystem with Embodied AI. (arXiv:2109.07703v3 [cs.RO] UPDATED)
    We introduce ROS-X-Habitat, a software interface that bridges the AI Habitat platform for embodied learning-based agents with other robotics resources via ROS. This interface not only offers standardized communication protocols between embodied agents and simulators, but also enables physically and photorealistic simulation that benefits the training and/or testing of vision-based embodied agents. With this interface, roboticists can evaluate their own Habitat RL agents in another ROS-based simulator or use Habitat Sim v2 as the test bed for their own robotic algorithms. Through in silico experiments, we demonstrate that ROS-X-Habitat has minimal impact on the navigation performance and simulation speed of a Habitat RGBD agent; that a standard set of ROS mapping, planning and navigation tools can run in Habitat Sim v2; and that a Habitat agent can run in the standard ROS simulator Gazebo.
    A study of tree-based methods and their combination. (arXiv:2204.13916v1 [stat.ML])
    Tree-based methods are popular machine learning techniques used in various fields. In this work, we review their foundations and a general framework the importance sampled learning ensemble (ISLE) that accelerates their fitting process. Furthermore, we describe a model combination strategy called the adaptive regression by mixing (ARM), which is feasible for tree- based methods via ISLE. Moreover, three modified ISLEs are proposed, and their performance are evaluated on the real data sets.
    Randomized Smoothing under Attack: How Good is it in Pratice?. (arXiv:2204.14187v1 [cs.CR])
    Randomized smoothing is a recent and celebrated solution to certify the robustness of any classifier. While it indeed provides a theoretical robustness against adversarial attacks, the dimensionality of current classifiers necessarily imposes Monte Carlo approaches for its application in practice. This paper questions the effectiveness of randomized smoothing as a defense, against state of the art black-box attacks. This is a novel perspective, as previous research works considered the certification as an unquestionable guarantee. We first formally highlight the mismatch between a theoretical certification and the practice of attacks on classifiers. We then perform attacks on randomized smoothing as a defense. Our main observation is that there is a major mismatch in the settings of the RS for obtaining high certified robustness or when defeating black box attacks while preserving the classifier accuracy.
    Cost Effective MLaaS Federation: A Combinatorial Reinforcement Learning Approach. (arXiv:2204.13971v1 [cs.LG])
    With the advancement of deep learning techniques, major cloud providers and niche machine learning service providers start to offer their cloud-based machine learning tools, also known as machine learning as a service (MLaaS), to the public. According to our measurement, for the same task, these MLaaSes from different providers have varying performance due to the proprietary datasets, models, etc. Federating different MLaaSes together allows us to improve the analytic performance further. However, naively aggregating results from different MLaaSes not only incurs significant momentary cost but also may lead to sub-optimal performance gain due to the introduction of possible false-positive results. In this paper, we propose Armol, a framework to federate the right selection of MLaaS providers to achieve the best possible analytic performance. We first design a word grouping algorithm to unify the output labels across different providers. We then present a deep combinatorial reinforcement learning based-approach to maximize the accuracy while minimizing the cost. The predictions from the selected providers are then aggregated together using carefully chosen ensemble strategies. The real-world trace-driven evaluation further demonstrates that Armol is able to achieve the same accuracy results with $67\%$ less inference cost.
    Topological Data Analysis in Time Series: Temporal Filtration and Application to Single-Cell Genomics. (arXiv:2204.14048v1 [cs.LG])
    The absence of a conventional association between the cell-cell cohabitation and its emergent dynamics into cliques during development has hindered our understanding of how cell populations proliferate, differentiate, and compete, i.e. the cell ecology. With the recent advancement of the single-cell RNA-sequencing (RNA-seq), we can potentially describe such a link by constructing network graphs that characterize the similarity of the gene expression profiles of the cell-specific transcriptional programs, and analyzing these graphs systematically using the summary statistics informed by the algebraic topology. We propose the single-cell topological simplicial analysis (scTSA). Applying this approach to the single-cell gene expression profiles from local networks of cells in different developmental stages with different outcomes reveals a previously unseen topology of cellular ecology. These networks contain an abundance of cliques of single-cell profiles bound into cavities that guide the emergence of more complicated habitation forms. We visualize these ecological patterns with topological simplicial architectures of these networks, compared with the null models. Benchmarked on the single-cell RNA-seq data of zebrafish embryogenesis spanning 38,731 cells, 25 cell types and 12 time steps, our approach highlights the gastrulation as the most critical stage, consistent with consensus in developmental biology. As a nonlinear, model-independent, and unsupervised framework, our approach can also be applied to tracing multi-scale cell lineage, identifying critical stages, or creating pseudo-time series.
    No Task Left Behind: Multi-Task Learning of Knowledge Tracing and Option Tracing for Better Student Assessment. (arXiv:2204.14006v1 [cs.CY])
    Student assessment is one of the most fundamental tasks in the field of AI Education (AIEd). One of the most common approach to student assessment is Knowledge Tracing (KT), which evaluates a student's knowledge state by predicting whether the student will answer a given question correctly or not. However, in the context of multiple choice (polytomous) questions, conventional KT approaches are limited in that they only consider the binary (dichotomous) correctness label (i.e., correct or incorrect), and disregard the specific option chosen by the student. Meanwhile, Option Tracing (OT) attempts to model a student by predicting which option they will choose for a given question, but overlooks the correctness information. In this paper, we propose Dichotomous-Polytomous Multi-Task Learning (DP-MTL), a multi-task learning framework that combines KT and OT for more precise student assessment. In particular, we show that the KT objective acts as a regularization term for OT in the DP-MTL framework, and propose an appropriate architecture for applying our method on top of existing deep learning-based KT models. We experimentally confirm that DP-MTL significantly improves both KT and OT performances, and also benefits downstream tasks such as Score Prediction (SP).
    RoSA: A Robust Self-Aligned Framework for Node-Node Graph Contrastive Learning. (arXiv:2204.13846v1 [cs.LG])
    Graph contrastive learning has gained significant progress recently. However, existing works have rarely explored non-aligned node-node contrasting. In this paper, we propose a novel graph contrastive learning method named RoSA that focuses on utilizing non-aligned augmented views for node-level representation learning. First, we leverage the earth mover's distance to model the minimum effort to transform the distribution of one view to the other as our contrastive objective, which does not require alignment between views. Then we introduce adversarial training as an auxiliary method to increase sampling diversity and enhance the robustness of our model. Experimental results show that RoSA outperforms a series of graph contrastive learning frameworks on homophilous, non-homophilous and dynamic graphs, which validates the effectiveness of our work. To the best of our awareness, RoSA is the first work focuses on the non-aligned node-node graph contrastive learning problem. Our codes are available at: \href{https://github.com/ZhuYun97/RoSA}{\texttt{https://github.com/ZhuYun97/RoSA}}
    Formulating Robustness Against Unforeseen Attacks. (arXiv:2204.13779v1 [cs.LG])
    Existing defenses against adversarial examples such as adversarial training typically assume that the adversary will conform to a specific or known threat model, such as $\ell_p$ perturbations within a fixed budget. In this paper, we focus on the scenario where there is a mismatch in the threat model assumed by the defense during training, and the actual capabilities of the adversary at test time. We ask the question: if the learner trains against a specific "source" threat model, when can we expect robustness to generalize to a stronger unknown "target" threat model during test-time? Our key contribution is to formally define the problem of learning and generalization with an unforeseen adversary, which helps us reason about the increase in adversarial risk from the conventional perspective of a known adversary. Applying our framework, we derive a generalization bound which relates the generalization gap between source and target threat models to variation of the feature extractor, which measures the expected maximum difference between extracted features across a given threat model. Based on our generalization bound, we propose adversarial training with variation regularization (AT-VR) which reduces variation of the feature extractor across the source threat model during training. We empirically demonstrate that AT-VR can lead to improved generalization to unforeseen attacks during test-time compared to standard adversarial training on Gaussian and image datasets.
    Making sense of violence risk predictions using clinical notes. (arXiv:2204.13976v1 [cs.LG])
    Violence risk assessment in psychiatric institutions enables interventions to avoid violence incidents. Clinical notes written by practitioners and available in electronic health records (EHR) are valuable resources that are seldom used to their full potential. Previous studies have attempted to assess violence risk in psychiatric patients using such notes, with acceptable performance. However, they do not explain why classification works and how it can be improved. We explore two methods to better understand the quality of a classifier in the context of clinical note analysis: random forests using topic models, and choice of evaluation metric. These methods allow us to understand both our data and our methodology more profoundly, setting up the groundwork to work on improved models that build upon this understanding. This is particularly important when it comes to the generalizability of evaluated classifiers to new data, a trustworthiness problem that is of great interest due to the increased availability of new data in electronic format.
    Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling. (arXiv:2204.14017v1 [cs.LG])
    Recent advances in federated learning have demonstrated its promising capability to learn on decentralized datasets. However, a considerable amount of work has raised concerns due to the potential risks of adversaries participating in the framework to poison the global model for an adversarial purpose. This paper investigates the feasibility of model poisoning for backdoor attacks through \textit{rare word embeddings of NLP models} in text classification and sequence-to-sequence tasks. In text classification, less than 1\% of adversary clients suffices to manipulate the model output without any drop in the performance of clean sentences. For a less complex dataset, a mere 0.1\% of adversary clients is enough to poison the global model effectively. We also propose a technique specialized in the federated learning scheme called gradient ensemble, which enhances the backdoor performance in all experimental settings.
    Explainable AI via Learning to Optimize. (arXiv:2204.14174v1 [math.OC])
    Indecipherable black boxes are common in machine learning (ML), but applications increasingly require explainable artificial intelligence (XAI). The core of XAI is to establish transparent and interpretable data-driven algorithms. This work provides concrete tools for XAI in situations where prior knowledge must be encoded and untrustworthy inferences flagged. We use the "learn to optimize" (L2O) methodology wherein each inference solves a data-driven optimization problem. Our L2O models are straightforward to implement, directly encode prior knowledge, and yield theoretical guarantees (e.g. satisfaction of constraints). We also propose use of interpretable certificates to verify whether model inferences are trustworthy. Numerical examples are provided in the applications of dictionary-based signal recovery, CT imaging, and arbitrage trading of cryptoassets.
    Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN. (arXiv:2204.14079v1 [cs.CV])
    Transfer learning of StyleGAN has recently shown great potential to solve diverse tasks, especially in domain translation. Previous methods utilized a source model by swapping or freezing weights during transfer learning, however, they have limitations on visual quality and controlling source features. In other words, they require additional models that are computationally demanding and have restricted control steps that prevent a smooth transition. In this paper, we propose a new approach to overcome these limitations. Instead of swapping or freezing, we introduce a simple feature matching loss to improve generation quality. In addition, to control the degree of source features, we train a target model with the proposed strategy, FixNoise, to preserve the source features only in a disentangled subspace of a target feature space. Owing to the disentangled feature space, our method can smoothly control the degree of the source features in a single model. Extensive experiments demonstrate that the proposed method can generate more consistent and realistic images than previous works.
    Federated Learning: Balancing the Thin Line Between Data Intelligence and Privacy. (arXiv:2204.13697v1 [cs.LG])
    Federated learning holds great promise in learning from fragmented sensitive data and has revolutionized how machine learning models are trained. This article provides a systematic overview and detailed taxonomy of federated learning. We investigate the existing security challenges in federated learning and provide a comprehensive overview of established defense techniques for data poisoning, inference attacks, and model poisoning attacks. The work also presents an overview of current training challenges for federated learning, focusing on handling non-i.i.d. data, high dimensionality issues, and heterogeneous architecture, and discusses several solutions for the associated challenges. Finally, we discuss the remaining challenges in managing federated learning training and suggest focused research directions to address the open questions. Potential candidate areas for federated learning, including IoT ecosystem, healthcare applications, are discussed with a particular focus on banking and financial domains.
    CATNet: Cross-event Attention-based Time-aware Network for Medical Event Prediction. (arXiv:2204.13847v1 [cs.LG])
    Medical event prediction (MEP) is a fundamental task in the medical domain, which needs to predict medical events, including medications, diagnosis codes, laboratory tests, procedures, outcomes, and so on, according to historical medical records. The task is challenging as medical data is a type of complex time series data with heterogeneous and temporal irregular characteristics. Many machine learning methods that consider the two characteristics have been proposed for medical event prediction. However, most of them consider the two characteristics separately and ignore the correlations among different types of medical events, especially relations between historical medical events and target medical events. In this paper, we propose a novel neural network based on attention mechanism, called cross-event attention-based time-aware network (CATNet), for medical event prediction. It is a time-aware, event-aware and task-adaptive method with the following advantages: 1) modeling heterogeneous information and temporal information in a unified way and considering temporal irregular characteristics locally and globally respectively, 2) taking full advantage of correlations among different types of events via cross-event attention. Experiments on two public datasets (MIMIC-III and eICU) show CATNet can be adaptive with different MEP tasks and outperforms other state-of-the-art methods on various MEP tasks. The source code of CATNet will be released after this manuscript is accepted.
    VPNets: Volume-preserving neural networks for learning source-free dynamics. (arXiv:2204.13843v1 [cs.LG])
    We propose volume-preserving networks (VPNets) for learning unknown source-free dynamical systems using trajectory data. We propose three modules and combine them to obtain two network architectures, coined R-VPNet and LA-VPNet. The distinct feature of the proposed models is that they are intrinsic volume-preserving. In addition, the corresponding approximation theorems are proved, which theoretically guarantee the expressivity of the proposed VPNets to learn source-free dynamics. The effectiveness, generalization ability and structure-preserving property of the VP-Nets are demonstrated by numerical experiments.
    Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows. (arXiv:2204.13939v1 [cs.LG])
    The transition to a fully renewable energy grid requires better forecasting of demand at the low-voltage level to increase efficiency and ensure reliable control. However, high fluctuations and increasing electrification cause huge forecast variability, not reflected in traditional point estimates. Probabilistic load forecasts take future uncertainties into account and thus allow more informed decision-making for the planning and operation of low-carbon energy systems. We propose an approach for flexible conditional density forecasting of short-term load based on Bernstein polynomial normalizing flows, where a neural network controls the parameters of the flow. In an empirical study with 363 smart meter customers, our density predictions compare favorably against Gaussian and Gaussian mixture densities. Also, they outperform a non-parametric approach based on the pinball loss for 24h-ahead load forecasting for two different neural network architectures.
    Learning from Natural Language Feedback. (arXiv:2204.14146v1 [cs.CL])
    Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm. First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback. Third, we finetune a language model to maximize the likelihood of the chosen refinement given the input. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements, finding that only large language models (175B parameters) do so. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
    High Dimensional Bayesian Optimization with Kernel Principal Component Analysis. (arXiv:2204.13753v1 [cs.LG])
    Bayesian Optimization (BO) is a surrogate-based global optimization strategy that relies on a Gaussian Process regression (GPR) model to approximate the objective function and an acquisition function to suggest candidate points. It is well-known that BO does not scale well for high-dimensional problems because the GPR model requires substantially more data points to achieve sufficient accuracy and acquisition optimization becomes computationally expensive in high dimensions. Several recent works aim at addressing these issues, e.g., methods that implement online variable selection or conduct the search on a lower-dimensional sub-manifold of the original search space. Advancing our previous work of PCA-BO that learns a linear sub-manifold, this paper proposes a novel kernel PCA-assisted BO (KPCA-BO) algorithm, which embeds a non-linear sub-manifold in the search space and performs BO on this sub-manifold. Intuitively, constructing the GPR model on a lower-dimensional sub-manifold helps improve the modeling accuracy without requiring much more data from the objective function. Also, our approach defines the acquisition function on the lower-dimensional sub-manifold, making the acquisition optimization more manageable. We compare the performance of KPCA-BO to the vanilla BO and PCA-BO on the multi-modal problems of the COCO/BBOB benchmark suite. Empirical results show that KPCA-BO outperforms BO in terms of convergence speed on most test problems, and this benefit becomes more significant when the dimensionality increases. For the 60D functions, KPCA-BO surpasses PCA-BO in many test cases. Moreover, it efficiently reduces the CPU time required to train the GPR model and optimize the acquisition function compared to the vanilla BO.
    DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification. (arXiv:2112.14299v2 [cs.LG] UPDATED)
    Data processing and analysis pipelines in cosmological survey experiments introduce data perturbations that can significantly degrade the performance of deep learning-based models. Given the increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: 1) increased observational noise as represented by higher levels of Poisson noise and 2) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness. Without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy by 23% on data with higher observational noise. Domain adaptation also increases by a factor of ~2.3 the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations.
    An Extensive Data Processing Pipeline for MIMIC-IV. (arXiv:2204.13841v1 [cs.LG])
    An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical tasks. This growing area of research has exposed the limitation of accessibility of EHR datasets for all, as well as the reproducibility of different modeling frameworks. One reason for these limitations is the lack of standardized pre-processing pipelines. MIMIC is a freely available EHR dataset in a raw format that has been used in numerous studies. The absence of standardized pre-processing steps serves as a major barrier to the wider adoption of the dataset. It also leads to different cohorts being used in downstream tasks, limiting the ability to compare the results among similar studies. Contrasting studies also use various distinct performance metrics, which can greatly reduce the ability to compare model results. In this work, we provide an end-to-end fully customizable pipeline to extract, clean, and pre-process data; and to predict and evaluate the fourth version of the MIMIC dataset (MIMIC-IV) for ICU and non-ICU-related clinical time-series prediction tasks.
    Particle Swarm Optimization Based Demand Response Using Artificial Neural Network Based Load Prediction. (arXiv:2204.13990v1 [cs.NE])
    In the present study, a Particle Swarm Optimization (PSO) based Demand Response (DR) model, using Artificial Neural Network (ANN) to predict load is proposed. The electrical load and climatological data of a residential area in Austin city in Texas are used as the inputs of the ANN. Then, the outcomes with the day-ahead prices data are used to solve the load shifting and cost reduction problem. According to the results, the proposed model has the ability to decrease payment costs and peak load.
    Multi-Agent MDP Homomorphic Networks. (arXiv:2110.04495v2 [cs.LG] UPDATED)
    This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations. For example, consider a group of agents navigating: rotating the state globally results in a permutation of the optimal joint policy. Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting, because such approaches rely on the global symmetry in the full state-action spaces, and these can result in correspondences across agents. To encode such symmetries while still allowing distributed execution we propose a factorization that decomposes global symmetries into local transformations. Our proposed factorization allows for distributing the computation that enforces global symmetries over local agents and local interactions. We introduce a multi-agent equivariant policy network based on this factorization. We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efficiency compared to non-equivariant baselines.
    Tailored Uncertainty Estimation for Deep Learning Systems. (arXiv:2204.13963v1 [cs.LG])
    Uncertainty estimation bears the potential to make deep learning (DL) systems more reliable. Standard techniques for uncertainty estimation, however, come along with specific combinations of strengths and weaknesses, e.g., with respect to estimation quality, generalization abilities and computational complexity. To actually harness the potential of uncertainty quantification, estimators are required whose properties closely match the requirements of a given use case. In this work, we propose a framework that, firstly, structures and shapes these requirements, secondly, guides the selection of a suitable uncertainty estimation method and, thirdly, provides strategies to validate this choice and to uncover structural weaknesses. By contributing tailored uncertainty estimation in this sense, our framework helps to foster trustworthy DL systems. Moreover, it anticipates prospective machine learning regulations that require, e.g., in the EU, evidences for the technical appropriateness of machine learning systems. Our framework provides such evidences for system components modeling uncertainty.
    Hyperbolic Hierarchical Knowledge Graph Embeddings for Link Prediction in Low Dimensions. (arXiv:2204.13704v1 [cs.LG])
    Knowledge graph embeddings (KGE) have been validated as powerful methods for inferring missing links in knowledge graphs (KGs) since they map entities into Euclidean space and treat relations as transformations of entities. Currently, some Euclidean KGE methods model semantic hierarchies prevalent in KGs and promote the performance of link prediction. For hierarchical data, instead of traditional Euclidean space, hyperbolic space as an embedding space has shown the promise of high fidelity and low memory consumption; however, existing hyperbolic KGE methods neglect to model them. To address this issue, we propose a novel KGE model -- hyperbolic hierarchical KGE (HypHKGE). To be specific, we first design the attention-based learnable curvatures for hyperbolic space to preserve rich semantic hierarchies. Moreover, we define the hyperbolic hierarchical transformations based on the theory of hyperbolic geometry, which utilize hierarchies that we preserved to infer the links. Experiments show that HypHKGE can effectively model semantic hierarchies in hyperbolic space and outperforms the state-of-the-art hyperbolic methods, especially in low dimensions.
    An Online Ensemble Learning Model for Detecting Attacks in Wireless Sensor Networks. (arXiv:2204.13814v1 [cs.NI])
    In today's modern world, the usage of technology is unavoidable and the rapid advances in the Internet and communication fields have resulted to expand the Wireless Sensor Network (WSN) technology. A huge number of sensing devices collect and/or generate numerous sensory data throughout time for a wide range of fields and applications. However, WSN has been proven to be vulnerable to security breaches, the harsh and unattended deployment of these networks, combined with their constrained resources and the volume of data generated introduce a major security concern. WSN applications are extremely critical, it is essential to build reliable solutions that involve fast and continuous mechanisms for online data stream analysis enabling the detection of attacks and intrusions. In this context, our aim is to develop an intelligent, efficient, and updatable intrusion detection system by applying an important machine learning concept known as ensemble learning in order to improve detection performance. Although ensemble models have been proven to be useful in offline learning, they have received less attention in streaming applications. In this paper, we examine the application of different homogeneous and heterogeneous online ensembles in sensory data analysis, on a specialized wireless sensor network-detection system (WSN-DS) dataset in order to classify four types of attacks: Blackhole attack, Grayhole, Flooding, and Scheduling among normal network traffic. Among the proposed novel online ensembles, both the heterogeneous ensemble consisting of an Adaptive Random Forest (ARF) combined with the Hoeffding Adaptive Tree (HAT) algorithm and the homogeneous ensemble HAT made up of 10 models achieved higher detection rates of 96.84% and 97.2%, respectively. The above models are efficient and effective in dealing with concept drift, while taking into account the resource constraints of WSNs.
    Bayesian Information Criterion for Event-based Multi-trial Ensemble data. (arXiv:2204.14096v1 [stat.ML])
    Transient recurring phenomena are ubiquitous in many scientific fields like neuroscience and meteorology. Time inhomogenous Vector Autoregressive Models (VAR) may be used to characterize peri-event system dynamics associated with such phenomena, and can be learned by exploiting multi-dimensional data gathering samples of the evolution of the system in multiple time windows comprising, each associated with one occurrence of the transient phenomenon, that we will call "trial". However, optimal VAR model order selection methods, commonly relying on the Akaike or Bayesian Information Criteria (AIC/BIC), are typically not designed for multi-trial data. Here we derive the BIC methods for multi-trial ensemble data which are gathered after the detection of the events. We show using simulated bivariate AR models that the multi-trial BIC is able to recover the real model order. We also demonstrate with simulated transient events and real data that the multi-trial BIC is able to estimate a sufficiently small model order for dynamic system modeling.
    Local Explanation of Dimensionality Reduction. (arXiv:2204.14012v1 [cs.LG])
    Dimensionality reduction (DR) is a popular method for preparing and analyzing high-dimensional data. Reduced data representations are less computationally intensive and easier to manage and visualize, while retaining a significant percentage of their original information. Aside from these advantages, these reduced representations can be difficult or impossible to interpret in most circumstances, especially when the DR approach does not provide further information about which features of the original space led to their construction. This problem is addressed by Interpretable Machine Learning, a subfield of Explainable Artificial Intelligence that addresses the opacity of machine learning models. However, current research on Interpretable Machine Learning has been focused on supervised tasks, leaving unsupervised tasks like Dimensionality Reduction unexplored. In this paper, we introduce LXDR, a technique capable of providing local interpretations of the output of DR techniques. Experiment results and two LXDR use case examples are presented to evaluate its usefulness.
    SwiftAgg: Communication-Efficient and Dropout-Resistant Secure Aggregation for Federated Learning with Worst-Case Security Guarantees. (arXiv:2202.04169v2 [cs.IT] UPDATED)
    We propose SwiftAgg, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N$ distributed users, each of size $L$, trained on their local data, in a privacy-preserving manner. Compared with state-of-the-art secure aggregation protocols, SwiftAgg significantly reduces the communication overheads without any compromise on security. Specifically, in presence of at most $D$ dropout users, SwiftAgg achieves a users-to-server communication load of $(T+1)L$ and a users-to-users communication load of up to $(N-1)(T+D+1)L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. The key idea of SwiftAgg is to partition the users into groups of size $D+T+1$, then in the first phase, secret sharing and aggregation of the individual models are performed within each group, and then in the second phase, model aggregation is performed on $D+T+1$ sequences of users across the groups. If a user in a sequence drops out in the second phase, the rest of the sequence remain silent. This design allows only a subset of users to communicate with each other, and only the users in a single group to directly communicate with the server, eliminating the requirements of 1) all-to-all communication network across users; and 2) all users communicating with the server, for other secure aggregation protocols. This helps to substantially slash the communication costs of the system.
    Framework for Behavioral Disorder Detection Using Machine Learning and Application of Virtual Cognitive Behavioral Therapy in COVID-19 Pandemic. (arXiv:2204.13900v1 [cs.LG])
    In this modern world, people are becoming more self-centered and unsocial. On the other hand, people are stressed, becoming more anxious during COVID-19 pandemic situation and exhibits symptoms of behavioral disorder. To measure the symptoms of behavioral disorder, usually psychiatrist use long hour sessions and inputs from specific questionnaire. This process is time consuming and sometime is ineffective to detect the right behavioral disorder. Also, reserved people sometime hesitate to follow this process. We have created a digital framework which can detect behavioral disorder and prescribe virtual Cognitive Behavioral Therapy (vCBT) for recovery. By using this framework people can input required data that are highly responsible for the three behavioral disorders namely depression, anxiety and internet addiction. We have applied machine learning technique to detect specific behavioral disorder from samples. This system guides the user with basic understanding and treatment through vCBT from anywhere any time which would potentially be the steppingstone for the user to be conscious and pursue right treatment.
    Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting--Full Version. (arXiv:2204.13767v1 [cs.LG])
    A variety of real-world applications rely on far future information to make decisions, thus calling for efficient and accurate long sequence multivariate time series forecasting. While recent attention-based forecasting models show strong abilities in capturing long-term dependencies, they still suffer from two key limitations. First, canonical self attention has a quadratic complexity w.r.t. the input time series length, thus falling short in efficiency. Second, different variables' time series often have distinct temporal dynamics, which existing studies fail to capture, as they use the same model parameter space, e.g., projection matrices, for all variables' time series, thus falling short in accuracy. To ensure high efficiency and accuracy, we propose Triformer, a triangular, variable-specific attention. (i) Linear complexity: we introduce a novel patch attention with linear complexity. When stacking multiple layers of the patch attentions, a triangular structure is proposed such that the layer sizes shrink exponentially, thus maintaining linear complexity. (ii) Variable-specific parameters: we propose a light-weight method to enable distinct sets of model parameters for different variables' time series to enhance accuracy without compromising efficiency and memory usage. Strong empirical evidence on four datasets from multiple domains justifies our design choices, and it demonstrates that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency. This is an extended version of "Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting", to appear in IJCAI 2022 [Cirstea et al., 2022a], including additional experimental results.
    Fairer LP-based Online Allocation via Analytic Center. (arXiv:2110.14621v3 [cs.DS] UPDATED)
    In this paper, we consider an online resource allocation problem where a decision maker accepts or rejects incoming customer requests irrevocably in order to maximize expected reward given limited resources. At each time, a new order/customer/bid is revealed with a request of some resource(s) and a reward. We consider a stochastic setting where all the orders are i.i.d. sampled from an unknown distribution. Such formulation arises from many classic applications such as the canonical (quantity-based) network revenue management problem and the Adwords problem. While the literature on the topic mainly focuses on regret minimization, our paper considers the \textit{fairness} aspect of the problem. On a high level, we define the fairness in a way that a fair online algorithm should treat similar agents/customers similarly, and the decision made for similar agents/customers should be consistent over time. To achieve this goal, we define the fair offline solution as the analytic center of the offline optimal solution set, and introduce \textit{cumulative unfairness} as the cumulative deviation from the online solutions to the fair offline solution over time. We propose a fair algorithm based on an interior-point LP solver and a mechanism that dynamically detects unfair resource spending. Our algorithm achieves cumulative unfairness on the scale of order $O(\log(T))$, while maintains the regret to be bounded without dependency on $T$. In addition, compared to the literature, our result is produced under less restrictive assumptions on the degeneracy of the underlying linear program.
    Convergence of gradient descent for deep neural networks. (arXiv:2203.16462v2 [cs.LG] UPDATED)
    Optimization by gradient descent has been one of main drivers of the "deep learning revolution". Yet, despite some recent progress for extremely wide networks, it remains an open problem to understand why gradient descent often converges to global minima when training deep neural networks. This article presents a new criterion for convergence of gradient descent to a global minimum, which is provably more powerful than the best available criteria from the literature, namely, the Lojasiewicz inequality and its generalizations. This criterion is used to show that gradient descent with proper initialization converges to a global minimum when training any feedforward neural network with smooth and strictly increasing activation functions, provided that the input dimension is greater than or equal to the number of data points.
    Machine Learning-Based GPS Multipath Detection Method Using Dual Antennas. (arXiv:2204.14001v1 [cs.NI])
    In urban areas, global navigation satellite system (GNSS) signals are often reflected or blocked by buildings, thus resulting in large positioning errors. In this study, we proposed a machine learning approach for global positioning system (GPS) multipath detection that uses dual antennas. A machine learning model that could classify GPS signal reception conditions was trained with several GPS measurements selected as suggested features. We applied five features for machine learning, including a feature obtained from the dual antennas, and evaluated the classification performance of the model, after applying four machine learning algorithms: gradient boosting decision tree (GBDT), random forest, decision tree, and K-nearest neighbor (KNN). It was found that a classification accuracy of 82%-96% was achieved when the test data set was collected at the same locations as those of the training data set. However, when the test data set was collected at locations different from those of the training data, a classification accuracy of 44%-77% was obtained.
    Depth Estimation with Simplified Transformer. (arXiv:2204.13791v1 [cs.CV])
    Transformer and its variants have shown state-of-the-art results in many vision tasks recently, ranging from image classification to dense prediction. Despite of their success, limited work has been reported on improving the model efficiency for deployment in latency-critical applications, such as autonomous driving and robotic navigation. In this paper, we aim at improving upon the existing transformers in vision, and propose a method for self-supervised monocular Depth Estimation with Simplified Transformer (DEST), which is efficient and particularly suitable for deployment on GPU-based platforms. Through strategic design choices, our model leads to significant reduction in model size, complexity, as well as inference latency, while achieving superior accuracy as compared to state-of-the-art. We also show that our design generalize well to other dense prediction task without bells and whistles.
    Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks. (arXiv:2111.02278v2 [cs.LG] UPDATED)
    Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for a univariate regularized regression problem. Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of "knot" points - i.e., points where the tangent of the ReLU network estimator changes - between two consecutive training inputs is at most three. In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique minimizer of a related free energy, which has a Gibbs form. Our key technical contribution consists in the analysis of the estimator resulting from this minimizer: we show that its second derivative vanishes everywhere, except at some specific locations which represent the "knot" points. We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory.
    Stochastic Video Prediction with Structure and Motion. (arXiv:2203.10528v2 [cs.CV] UPDATED)
    While stochastic video prediction models enable future prediction under uncertainty, they mostly fail to model the complex dynamics of real-world scenes. For example, they cannot provide reliable predictions for scenes with a moving camera and independently moving foreground objects in driving scenarios. The existing methods fail to fully capture the dynamics of the structured world by only focusing on changes in pixels. In this paper, we assume that there is an underlying process creating observations in a video and propose to factorize it into static and dynamic components. We model the static part based on the scene structure and the ego-motion of the vehicle, and the dynamic part based on the remaining motion of the dynamic objects. By learning separate distributions of changes in foreground and background, we can decompose the scene into static and dynamic parts and separately model the change in each. Our experiments demonstrate that disentangling structure and motion helps stochastic video prediction, leading to better future predictions in complex driving scenarios on two real-world driving datasets, KITTI and Cityscapes.
    Feature extraction using Spectral Clustering for Gene Function Prediction using Hierarchical Multi-label Classification. (arXiv:2203.13551v2 [cs.LG] UPDATED)
    Gene annotation addresses the problem of predicting unknown associations between gene and functions (e.g., biological processes) of a specific organism. Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents a novel in silico approach for to the annotation problem that combines cluster analysis and hierarchical multi-label classification (HMC). The approach uses spectral clustering to extract new features from the gene co-expression network (GCN) and enrich the prediction task. HMC is used to build multiple estimators that consider the hierarchical structure of gene functions. The proposed approach is applied to a case study on Zea mays, one of the most dominant and productive crops in the world. The results illustrate how in silico approaches are key to reduce the time and costs of gene annotation. More specifically, they highlight the importance of: (i) building new features that represent the structure of gene relationships in GCNs to annotate genes; and (ii) taking into account the structure of biological processes to obtain consistent predictions.
    Statistical applications of contrastive learning. (arXiv:2204.13999v1 [cs.LG])
    The likelihood function plays a crucial role in statistical inference and experimental design. However, it is computationally intractable for several important classes of statistical models, including energy-based models and simulator-based models. Contrastive learning is an intuitive and computationally feasible alternative to likelihood-based learning. We here first provide an introduction to contrastive learning and then show how we can use it to derive methods for diverse statistical problems, namely parameter estimation for energy-based models, Bayesian inference for simulator-based models, as well as experimental design.
    Dynamic Diagnosis of the Progress and Shortcomings of Student Learning using Machine Learning based on Cognitive, Social, and Emotional Features. (arXiv:2204.13989v1 [cs.CY])
    Student diversity, like academic background, learning styles, career and life goals, ethnicity, age, social and emotional characteristics, course load and work schedule, offers unique opportunities in education, like learning new skills, peer mentoring and example setting. But student diversity can be challenging too as it adds variability in the way in which students learn and progress over time. A single teaching approach is likely to be ineffective and result in students not meeting their potential. Automated support could address limitations of traditional teaching by continuously assessing student learning and implementing needed interventions. This paper discusses a novel methodology based on data analytics and Machine Learning to measure and causally diagnose the progress and shortcomings of student learning, and then utilizes the insight gained on individuals to optimize learning. Diagnosis pertains to dynamic diagnostic formative assessment, which aims to uncover the causes of learning shortcomings. The methodology groups learning difficulties into four categories: recall from memory, concept adjustment, concept modification, and problem decomposition into sub-goals (sub-problems) and concept combination. Data models are predicting the occurrence of each of the four challenge types, as well as a student's learning trajectory. The models can be used to automatically create real-time, student-specific interventions (e.g., learning cues) to address less understood concepts. We envision that the system will enable new adaptive pedagogical approaches to unleash student learning potential through customization of the course material to the background, abilities, situation, and progress of each student; and leveraging diversity-related learning experiences.
    Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking. (arXiv:2204.12263v2 [cs.CL] UPDATED)
    With the explosive growth of scientific publications, making the synthesis of scientific knowledge and fact checking becomes an increasingly complex task. In this paper, we propose a multi-task approach for verifying the scientific questions based on a joint reasoning from facts and evidence in research articles. We propose an intelligent combination of (1) an automatic information summarization and (2) a Boolean Question Answering which allows to generate an answer to a scientific question from only extracts obtained after summarization. Thus on a given topic, our proposed approach conducts structured content modeling based on paper abstracts to answer a scientific question while highlighting texts from paper that discuss the topic. We based our final system on an end-to-end Extractive Question Answering (EQA) combined with a three outputs classification model to perform in-depth semantic understanding of a question to illustrate the aggregation of multiple responses. With our light and fast proposed architecture, we achieved an average error rate of 4% and a F1-score of 95.6%. Our results are supported via experiments with two QA models (BERT, RoBERTa) over 3 Million Open Access (OA) articles in the medical and health domains on Europe PMC.
    Toward Degradation-Robust Voice Conversion. (arXiv:2110.07537v3 [eess.AS] UPDATED)
    Any-to-any voice conversion technologies convert the vocal timbre of an utterance to any speaker even unseen during training. Although there have been several state-of-the-art any-to-any voice conversion models, they were all based on clean utterances to convert successfully. However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations. It thus becomes highly desired to understand how these degradations affect voice conversion and build a degradation-robust model. We report in this paper the first comprehensive study on the degradation robustness of any-to-any voice conversion. We show that the performance of state-of-the-art models nowadays was severely hampered given degraded utterances. To this end, we then propose speech enhancement concatenation and denoising training to improve the robustness. In addition to common degradations, we also consider adversarial noises, which alter the model output significantly yet are human-imperceptible. It was shown that both concatenations with off-the-shelf speech enhancement models and denoising training on voice conversion models could improve the robustness, while each of them had pros and cons.  ( 2 min )
    One-Way Matching of Datasets with Low Rank Signals. (arXiv:2204.13858v1 [math.ST])
    We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated examples. Furthermore, we illustrate practical use of the matching procedure on two single-cell data examples.
    Post-hoc Interpretability for Neural NLP: A Survey. (arXiv:2108.04840v4 [cs.CL] UPDATED)
    Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.
    AGIC: Approximate Gradient Inversion Attack on Federated Learning. (arXiv:2204.13784v1 [cs.LG])
    Federated learning is a private-by-design distributed learning paradigm where clients train local models on their own data before a central server aggregates their local updates to compute a global model. Depending on the aggregation method used, the local updates are either the gradients or the weights of local learning models. Recent reconstruction attacks apply a gradient inversion optimization on the gradient update of a single minibatch to reconstruct the private data used by clients during training. As the state-of-the-art reconstruction attacks solely focus on single update, realistic adversarial scenarios are overlooked, such as observation across multiple updates and updates trained from multiple mini-batches. A few studies consider a more challenging adversarial scenario where only model updates based on multiple mini-batches are observable, and resort to computationally expensive simulation to untangle the underlying samples for each local step. In this paper, we propose AGIC, a novel Approximate Gradient Inversion Attack that efficiently and effectively reconstructs images from both model or gradient updates, and across multiple epochs. In a nutshell, AGIC (i) approximates gradient updates of used training samples from model updates to avoid costly simulation procedures, (ii) leverages gradient/model updates collected from multiple epochs, and (iii) assigns increasing weights to layers with respect to the neural network structure for reconstruction quality. We extensively evaluate AGIC on three datasets, CIFAR-10, CIFAR-100 and ImageNet. Our results show that AGIC increases the peak signal-to-noise ratio (PSNR) by up to 50% compared to two representative state-of-the-art gradient inversion attacks. Furthermore, AGIC is faster than the state-of-the-art simulation based attack, e.g., it is 5x faster when attacking FedAvg with 8 local steps in between model updates.
    Modular Domain Adaptation. (arXiv:2204.14213v1 [cs.CL])
    Off-the-shelf models are widely used by computational social science researchers to measure properties of text, such as sentiment. However, without access to source data it is difficult to account for domain shift, which represents a threat to validity. Here, we treat domain adaptation as a modular process that involves separate model producers and model consumers, and show how they can independently cooperate to facilitate more accurate measurements of text. We introduce two lightweight techniques for this scenario, and demonstrate that they reliably increase out-of-domain accuracy on four multi-domain text classification datasets when used with linear and contextual embedding models. We conclude with recommendations for model producers and consumers, and release models and replication code to accompany this paper.  ( 2 min )
    Physical Deep Learning with Biologically Plausible Training Method. (arXiv:2204.13991v1 [cs.NE])
    The ever-growing demand for further advances in artificial intelligence motivated research on unconventional computation based on analog physical devices. While such computation devices mimic brain-inspired analog information processing, learning procedures still relies on methods optimized for digital processing such as backpropagation. Here, we present physical deep learning by extending a biologically plausible training algorithm called direct feedback alignment. As the proposed method is based on random projection with arbitrary nonlinear activation, we can train a physical neural network without knowledge about the physical system. In addition, we can emulate and accelerate the computation for this training on a simple and scalable physical system. We demonstrate the proof-of-concept using a hierarchically connected optoelectronic recurrent neural network called deep reservoir computer. By constructing an FPGA-assisted optoelectronic benchtop, we confirmed the potential for accelerated computation with competitive performance on benchmarks. Our results provide practical solutions for the training and acceleration of neuromorphic computation.  ( 2 min )
    Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection. (arXiv:2204.13702v1 [cs.LG])
    This paper presents an optimized logistic regression machine learning model that predicts the occupancy of an Electric Vehicle (EV) charging station given the occupancy of neighboring stations. The model was optimized for the time of day. Trained on data from 57 EV charging stations around the University of California San Diego campus, the model achieved an 88.43% average accuracy and 92.23% maximum accuracy in predicting occupancy, outperforming a persistence model benchmark.  ( 2 min )
    HyperJump: Accelerating HyperBand via Risk Modelling. (arXiv:2108.02479v3 [cs.LG] UPDATED)
    In the literature on hyper-parameter tuning, a number of recent solutions rely on low-fidelity observations (e.g., training with sub-sampled datasets or for short periods of time) to extrapolate good configurations to use when performing full training. Among these, HyperBand is arguably one of the most popular solutions, due to its efficiency and theoretically provable robustness. In this work, we introduce HyperJump, a new approach that builds on HyperBand's robust search strategy and complements it with novel model-based risk analysis techniques that accelerate the search by \textit{jumping} the evaluation of low risk configurations, i.e., configurations that are likely to be discarded by HyperBand. We evaluate HyperJump on a suite of hyper-parameter optimization problems and show that it provides over one-order of magnitude speed-ups, both in sequential and parallel deployments, on a variety of deep learning, kernel-based learning, and neural architectural search problems when compared to HyperBand and to several state-of-the-art optimizers.  ( 2 min )
    A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models. (arXiv:2204.14061v1 [cs.LG])
    The goal of Quality Diversity Optimization is to generate a collection of diverse yet high-performing solutions to a given problem at hand. Typical benchmark problems are, for example, finding a repertoire of robot arm configurations or a collection of game playing strategies. In this paper, we propose a set of Quality Diversity Optimization problems that tackle hyperparameter optimization of machine learning models - a so far underexplored application of Quality Diversity Optimization. Our benchmark problems involve novel feature functions, such as interpretability or resource usage of models. To allow for fast and efficient benchmarking, we build upon YAHPO Gym, a recently proposed open source benchmarking suite for hyperparameter optimization that makes use of high performing surrogate models and returns these surrogate model predictions instead of evaluating the true expensive black box function. We present results of an initial experimental study comparing different Quality Diversity optimizers on our benchmark problems. Furthermore, we discuss future directions and challenges of Quality Diversity Optimization in the context of hyperparameter optimization.  ( 2 min )
    Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs. (arXiv:2204.14007v1 [cs.DC])
    On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these accelerators. However, existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms. In this work, we provide a two-pronged approach to this challenge: (i) a NAS-enabling infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants that provide flexible quality/performance trade-offs on ML accelerators, complementing the existing full and depthwise convolution based IBNs. Using this approach we target a state-of-the-art mobile platform, Google Tensor SoC, and demonstrate neural architectures that improve the quality-performance pareto frontier for various computer vision (classification, detection, segmentation) as well as natural language processing tasks.  ( 2 min )
    Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. (arXiv:2006.00425v3 [math.OC] UPDATED)
    Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds. Different from existing optimal methods, PStorm can achieve the ${O}(\varepsilon^{-3})$ result by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.  ( 2 min )
    Flamingo: a Visual Language Model for Few-Shot Learning. (arXiv:2204.14198v1 [cs.CV])
    Building models that can be rapidly adapted to numerous tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. Flamingo models include key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of the proposed Flamingo models, exploring and measuring their ability to rapidly adapt to a variety of image and video understanding benchmarks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer, captioning tasks, which evaluate the ability to describe a scene or an event, and close-ended tasks such as multiple choice visual question-answering. For tasks lying anywhere on this spectrum, we demonstrate that a single Flamingo model can achieve a new state of the art for few-shot learning, simply by prompting the model with task-specific examples. On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data.  ( 2 min )
    MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation. (arXiv:2111.12707v3 [cs.CV] UPDATED)
    Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, the task is decomposed into three stages: (i) Generate multiple initial hypothesis representations; (ii) Model self-hypothesis communication, merge multiple hypotheses into a single converged representation and then partition it into several diverged hypotheses; (iii) Learn cross-hypothesis communication and aggregate the multi-hypothesis features to synthesize the final 3D pose. Through the above processes, the final representation is enhanced and the synthesized pose is much more accurate. Extensive experiments show that MHFormer achieves state-of-the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP. Without bells and whistles, its performance surpasses the previous best result by a large margin of 3% on Human3.6M. Code and models are available at \url{https://github.com/Vegetebird/MHFormer}.  ( 2 min )
    Preoperative brain tumor imaging: models and software for segmentation and standardized reporting. (arXiv:2204.14199v1 [eess.IV])
    For patients suffering from brain tumor, prognosis estimation and treatment decisions are made by a multidisciplinary team based on a set of preoperative MR scans. Currently, the lack of standardized and automatic methods for tumor detection and generation of clinical reports represents a major hurdle. In this study, we investigate glioblastomas, lower grade gliomas, meningiomas, and metastases, through four cohorts of up to 4000 patients. Tumor segmentation models were trained using the AGU-Net architecture with different preprocessing steps and protocols. Segmentation performances were assessed in-depth using a wide-range of voxel and patient-wise metrics covering volume, distance, and probabilistic aspects. Finally, two software solutions have been developed, enabling an easy use of the trained models and standardized generation of clinical reports: Raidionics and Raidionics-Slicer. Segmentation performances were quite homogeneous across the four different brain tumor types, with an average true positive Dice ranging between 80% and 90%, patient-wise recall between 88% and 98%, and patient-wise precision around 95%. With our Raidionics software, running on a desktop computer with CPU support, tumor segmentation can be performed in 16 to 54 seconds depending on the dimensions of the MRI volume. For the generation of a standardized clinical report, including the tumor segmentation and features computation, 5 to 15 minutes are necessary. All trained models have been made open-access together with the source code for both software solutions and validation metrics computation. In the future, an automatic classification of the brain tumor type would be necessary to replace manual user input. Finally, the inclusion of post-operative segmentation in both software solutions will be key for generating complete post-operative standardized clinical reports.  ( 3 min )
    Automatic Machine Learning for Multi-Receiver CNN Technology Classifiers. (arXiv:2204.13819v1 [cs.LG])
    Convolutional Neural Networks (CNNs) are one of the most studied family of deep learning models for signal classification, including modulation, technology, detection, and identification. In this work, we focus on technology classification based on raw I/Q samples collected from multiple synchronized receivers. As an example use case, we study protocol identification of Wi-Fi, LTE-LAA, and 5G NR-U technologies that coexist over the 5 GHz Unlicensed National Information Infrastructure (U-NII) bands. Designing and training accurate CNN classifiers involve significant time and effort that goes into fine-tuning a model's architectural settings and determining the appropriate hyperparameter configurations, such as learning rate and batch size. We tackle the former by defining architectural settings themselves as hyperparameters. We attempt to automatically optimize these architectural parameters, along with other preprocessing (e.g., number of I/Q samples within each classifier input) and learning hyperparameters, by forming a Hyperparameter Optimization (HyperOpt) problem, which we solve in a near-optimal fashion using the Hyperband algorithm. The resulting near-optimal CNN (OCNN) classifier is then used to study classification accuracy for OTA as well as simulations datasets, considering various SNR values. We show that the number of receivers to construct multi-channel inputs for CNNs should be defined as a preprocessing hyperparameter to be optimized via Hyperband. OTA results reveal that our OCNN classifiers improve classification accuracy by 24.58% compared to manually tuned CNNs. We also study the effect of min-max normalization of I/Q samples within each classifier's input on generalization accuracy over simulated datasets with SNRs other than training set's SNR and show an average of 108.05% improvement when I/Q samples are normalized.  ( 2 min )
    A Neural Network-enhanced Reproducing Kernel Particle Method for Modeling Strain Localization. (arXiv:2204.13821v1 [cs.CE])
    Modeling the localized intensive deformation in a damaged solid requires highly refined discretization for accurate prediction, which significantly increases the computational cost. Although adaptive model refinement can be employed for enhanced effectiveness, it is cumbersome for the traditional mesh-based methods to perform while modeling the evolving localizations. In this work, neural network-enhanced reproducing kernel particle method (NN-RKPM) is proposed, where the location, orientation, and shape of the solution transition near a localization is automatically captured by the NN approximation via a block-level neural network optimization. The weights and biases in the blocked parametrization network control the location and orientation of the localization. The designed basic four-kernel NN block is capable of capturing a triple junction or a quadruple junction topological pattern, while more complicated localization topological patters are captured by the superposition of multiple four-kernel NN blocks. The standard RK approximation is then utilized to approximate the smooth part of the solution, which permits a much coarser discretization than the high-resolution discretization needed to capture sharp solution transitions with the conventional methods. A regularization of the neural network approximation is additionally introduced for discretization-independent material responses. The effectiveness of the proposed NN-RKPM is verified by a series of numerical verifications.  ( 2 min )
    Industry-academia research collaboration and knowledge co-creation: Patterns and anti-patterns. (arXiv:2204.14180v1 [cs.SE])
    Increasing the impact of software engineering research in the software industry and the society at large has long been a concern of high priority for the software engineering community. The problem of two cultures, research conducted in a vacuum (disconnected from the real world), or misaligned time horizons are just some of the many complex challenges standing in the way of successful industry-academia collaborations. This paper reports on the experience of research collaboration and knowledge co-creation between industry and academia in software engineering as a way to bridge the research-practice collaboration gap. Our experience spans 14 years of collaboration between researchers in software engineering and the European and Norwegian software and IT industry. Using the participant observation and interview methods we have collected and afterwards analyzed an extensive record of qualitative data. Drawing upon the findings made and the experience gained, we provide a set of 14 patterns and 14 anti-patterns for industry-academia collaborations, aimed to support other researchers and practitioners in establishing and running research collaboration projects in software engineering.  ( 2 min )
    Application of machine learning methods to detect and classify Core images using GAN and texture recognition. (arXiv:2204.14224v1 [cs.CV])
    During exploration campaigns, oil companies rely heavily on drill core samples as they provide valuable geological information that helps them find important oil deposits. Traditional core logging techniques are laborious and subjective. Core imaging, a new technique in the oil industry, is used to supplement analysis by rapidly characterising large quantities of drill cores in a nondestructive and noninvasive manner. In this paper, we will present the problem of core detection and classification. The first problem is detecting the cores and segmenting the holes in images by using Faster RCNN and Mask RCNN models respectively. The second problem is filling the hole in the core image by applying the Generative adversarial network(GAN) technique and using Contextual Residual Aggregation(CRA) which creates high frequency residual for missing contents in images. And finally applying Texture recognition models for the classification of core images.  ( 2 min )
    Task Embedding Temporal Convolution Networks for Transfer Learning Problems in Renewable Power Time-Series Forecast. (arXiv:2204.13908v1 [cs.LG])
    Task embeddings in multi-layer perceptrons for multi-task learning and inductive transfer learning in renewable power forecasts have recently been introduced. In many cases, this approach improves the forecast error and reduces the required training data. However, it does not take the seasonal influences in power forecasts within a day into account, i.e., the diurnal cycle. Therefore, we extended this idea to temporal convolutional networks to consider those seasonalities. We propose transforming the embedding space, which contains the latent similarities between tasks, through convolution and providing these results to the network's residual block. The proposed architecture significantly improves up to 25 percent for multi-task learning for power forecasts on the EuropeWindFarm and GermanSolarFarm dataset compared to the multi-layer perceptron approach. Based on the same data, we achieve a ten percent improvement for the wind datasets and more than 20 percent in most cases for the solar dataset for inductive transfer learning without catastrophic forgetting. Finally, we are the first proposing zero-shot learning for renewable power forecasts to provide predictions even if no training data is available.  ( 2 min )
    CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers. (arXiv:2204.14217v1 [cs.CV])
    The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and finetune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.  ( 2 min )
    Escaping Spurious Local Minima of Low-Rank Matrix Factorization Through Convex Lifting. (arXiv:2204.14067v1 [cs.LG])
    This work proposes a rapid global solver for nonconvex low-rank matrix factorization (MF) problems that we name MF-Global. Through convex lifting steps, our method efficiently escapes saddle points and spurious local minima ubiquitous in noisy real-world data, and is guaranteed to always converge to the global optima. Moreover, the proposed approach adaptively adjusts the rank for the factorization and provably identifies the optimal rank for MF automatically in the course of optimization through tools of manifold identification, and thus it also spends significantly less time on parameter tuning than existing MF methods, which require an exhaustive search for this optimal rank. On the other hand, when compared to methods for solving the lifted convex form only, MF-Global leads to significantly faster convergence and much shorter running time. Experiments on real-world large-scale recommendation system problems confirm that MF-Global can indeed effectively escapes spurious local solutions at which existing MF approaches stuck, and is magnitudes faster than state-of-the-art algorithms for the lifted convex form.  ( 2 min )
    Learned Gradient of a Regularizer for Plug-and-Play Gradient Descent. (arXiv:2204.13940v1 [eess.IV])
    The Plug-and-Play (PnP) framework allows integrating advanced image denoising priors into optimization algorithms, to efficiently solve a variety of image restoration tasks. The Plug-and-Play alternating direction method of multipliers (ADMM) and the Regularization by Denoising (RED) algorithms are two examples of such methods that made a breakthrough in image restoration. However, while the former method only applies to proximal algorithms, it has recently been shown that there exists no regularization that explains the RED algorithm when the denoisers lack Jacobian symmetry, which happen to be the case of most practical denoisers. To the best of our knowledge, there exists no method for training a network that directly represents the gradient of a regularizer, which can be directly used in Plug-and-Play gradient-based algorithms. We show that it is possible to train a denoiser along with a network that corresponds to the gradient of its regularizer. We use this gradient of the regularizer in gradient-based optimization methods and obtain better results comparing to other generic Plug-and-Play approaches. We also show that the regularizer can be used as a pre-trained network for unrolled gradient descent. Lastly, we show that the resulting denoiser allows for a quick convergence of the Plug-and-Play ADMM.  ( 2 min )
    Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery. (arXiv:2204.13906v1 [cs.RO])
    Current reinforcement learning (RL) in robotics often experiences difficulty in generalizing to new downstream tasks due to the innate task-specific training paradigm. To alleviate it, unsupervised RL, a framework that pre-trains the agent in a task-agnostic manner without access to the task-specific reward, leverages active exploration for distilling diverse experience into essential skills or reusable knowledge. For exploiting such benefits also in robotic manipulation, we propose an unsupervised method for transferable manipulation skill discovery that ties structured exploration toward interacting behavior and transferable skill learning. It not only enables the agent to learn interaction behavior, the key aspect of the robotic manipulation learning, without access to the environment reward, but also to generalize to arbitrary downstream manipulation tasks with the learned task-agnostic skills. Through comparative experiments, we show that our approach achieves the most diverse interacting behavior and significantly improves sample efficiency in downstream tasks including the extension to multi-object, multitask problems.  ( 2 min )
    3D Common Corruptions and Data Augmentation. (arXiv:2203.01441v3 [cs.CV] UPDATED)
    We introduce a set of image transformations that can be used as corruptions to evaluate the robustness of models as well as data augmentation mechanisms for training neural networks. The primary distinction of the proposed transformations is that, unlike existing approaches such as Common Corruptions, the geometry of the scene is incorporated in the transformations -- thus leading to corruptions that are more likely to occur in the real world. We also introduce a set of semantic corruptions (e.g. natural object occlusions). We show these transformations are `efficient' (can be computed on-the-fly), `extendable' (can be applied on most image datasets), expose vulnerability of existing models, and can effectively make models more robust when employed as `3D data augmentation' mechanisms. The evaluations on several tasks and datasets suggest incorporating 3D information into benchmarking and training opens up a promising direction for robustness research.
    Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities. (arXiv:2204.13707v1 [cs.LG])
    Multimodal sentiment analysis has been studied under the assumption that all modalities are available. However, such a strong assumption does not always hold in practice, and most of multimodal fusion models may fail when partial modalities are missing. Several works have addressed the missing modality problem; but most of them only considered the single modality missing case, and ignored the practically more general cases of multiple modalities missing. To this end, in this paper, we propose a Tag-Assisted Transformer Encoder (TATE) network to handle the problem of missing uncertain modalities. Specifically, we design a tag encoding module to cover both the single modality and multiple modalities missing cases, so as to guide the network's attention to those missing modalities. Besides, we adopt a new space projection pattern to align common vectors. Then, a Transformer encoder-decoder network is utilized to learn the missing modality features. At last, the outputs of the Transformer encoder are used for the final sentiment classification. Extensive experiments are conducted on CMU-MOSI and IEMOCAP datasets, showing that our method can achieve significant improvements compared with several baselines.  ( 2 min )
    Learning cosmology and clustering with cosmic graphs. (arXiv:2204.13713v1 [astro-ph.CO])
    We train deep learning models on thousands of galaxy catalogues from the state-of-the-art hydrodynamic simulations of the CAMELS project to perform regression and inference. We employ Graph Neural Networks (GNNs), architectures designed to work with irregular and sparse data, like the distribution of galaxies in the Universe. We first show that GNNs can learn to compute the power spectrum of galaxy catalogues with a few percent accuracy. We then train GNNs to perform likelihood-free inference at the galaxy-field level. Our models are able to infer the value of $\Omega_{\rm m}$ with a $\sim12\%-13\%$ accuracy just from the positions of $\sim1000$ galaxies in a volume of $(25~h^{-1}{\rm Mpc})^3$ at $z=0$ while accounting for astrophysical uncertainties as modelled in CAMELS. Incorporating information from galaxy properties, such as stellar mass, stellar metallicity, and stellar radius, increases the accuracy to $4\%-8\%$. Our models are built to be translational and rotational invariant, and they can extract information from any scale larger than the minimum distance between two galaxies. However, our models are not completely robust: testing on simulations run with a different subgrid physics than the ones used for training does not yield as accurate results.  ( 2 min )
    Probabilistic Permutation Graph Search: Black-Box Optimization for Fairness in Ranking. (arXiv:2204.13765v1 [cs.LG])
    There are several measures for fairness in ranking, based on different underlying assumptions and perspectives. PL optimization with the REINFORCE algorithm can be used for optimizing black-box objective functions over permutations. In particular, it can be used for optimizing fairness measures. However, though effective for queries with a moderate number of repeating sessions, PL optimization has room for improvement for queries with a small number of repeating sessions. In this paper, we present a novel way of representing permutation distributions, based on the notion of permutation graphs. Similar to PL, our distribution representation, called PPG, can be used for black-box optimization of fairness. Different from PL, where pointwise logits are used as the distribution parameters, in PPG pairwise inversion probabilities together with a reference permutation construct the distribution. As such, the reference permutation can be set to the best sampled permutation regarding the objective function, making PPG suitable for both deterministic and stochastic rankings. Our experiments show that PPG, while comparable to PL for larger session repetitions (i.e., stochastic ranking), improves over PL for optimizing fairness metrics for queries with one session (i.e., deterministic ranking). Additionally, when accurate utility estimations are available, e.g., in tabular models, the performance of PPG in fairness optimization is significantly boosted compared to lower quality utility estimations from a learning to rank model, leading to a large performance gap with PL. Finally, the pairwise probabilities make it possible to impose pairwise constraints such as "item $d_1$ should always be ranked higher than item $d_2$." Such constraints can be used to simultaneously optimize the fairness metric and control another objective such as ranking performance.  ( 2 min )
    Fast Sampling of Diffusion Models with Exponential Integrator. (arXiv:2204.13902v1 [cs.LG])
    The past few years have witnessed the great success of Diffusion models~(DMs) in generating high-fidelity samples in generative modeling tasks. A major limitation of the DM is its notoriously slow sampling procedure which normally requires hundreds to thousands of time discretization steps of the learned diffusion process to reach the desired accuracy. Our goal is to develop a fast sampling method for DMs with much less number of steps while retaining high sample quality. To this end, we systematically analyze the sampling procedure in DMs and identify key factors that affect the sample quality, among which the method of discretization is most crucial. By carefully examining the learned diffusion process, we propose Diffusion Exponential Integrator Sampler~(DEIS). It is based on the Exponential Integrator designed for discretizing ordinary differential equations (ODEs) and leverages a semilinear structure of the learned diffusion process to reduce the discretization error. The proposed method can be applied to any DMs and can generate high-fidelity samples in as few as 10 steps. In our experiments, it takes about 3 minutes on one A6000 GPU to generate $50k$ images from CIFAR10. Moreover, by directly using pre-trained DMs, we achieve the state-of-art sampling performance when the number of score function evaluation~(NFE) is limited, e.g., 3.37 FID and 9.74 Inception score with only 15 NFEs on CIFAR10.  ( 2 min )
    Who will stay? Using Deep Learning to predict engagement of citizen scientists. (arXiv:2204.14046v1 [cs.LG])
    Citizen science and machine learning should be considered for monitoring the coastal and ocean environment due to the scale of threats posed by climate change and the limited resources to fill knowledge gaps. Using data from the annotation activity of citizen scientists in a Swedish marine project, we constructed Deep Neural Network models to predict forthcoming engagement. We tested the models to identify patterns in annotation engagement. Based on the results, it is possible to predict whether an annotator will remain active in future sessions. Depending on the goals of individual citizen science projects, it may also be necessary to identify either those volunteers who will leave or those who will continue annotating. This can be predicted by varying the threshold for the prediction. The engagement metrics used to construct the models are based on time and activity and can be used to infer latent characteristics of volunteers and predict their task interest based on their activity patterns. They can estimate if volunteers can accomplish a given number of tasks in a certain amount of time, identify early on who is likely to become a top contributor or identify who is likely to quit and provide them with targeted interventions. The novelty of our predictive models lies in the use of Deep Neural Networks and the sequence of volunteer annotations. A limitation of our models is that they do not use embeddings constructed from user profiles as input data, as many recommender systems do. We expect that including user profiles would improve prediction performance.  ( 2 min )
    Visualization and Optimization Techniques for High Dimensional Parameter Spaces. (arXiv:2204.13812v1 [cs.HC])
    High dimensional parameter space optimization is crucial in many applications. The parameters affecting this performance can be both numerical and categorical in their type. The existing techniques of black-box optimization and visual analytics are good in dealing with numerical parameters but analyzing categorical variables in context of the numerical variables are not well studied. Hence, we propose a novel approach, to create an auto-tuning framework for storage systems optimization combining both direct optimization techniques and visual analytics research. While the optimization algorithm will be the core of the system, visual analytics will provide a guideline with the help of an external agent (expert) to provide crucial hints to narrow down the large search space for the optimization engine. As part of the initial step towards creating an auto-tuning engine for storage systems optimization, we created an Interactive Configuration Explorer \textit{ICE}, which directly addresses the need of analysts to learn how the dependent numerical variable is affected by the parameter settings given multiple optimization objectives. No information is lost as ICE shows the complete distribution and statistics of the dependent variable in context with each categorical variable. Analysts can interactively filter the variables to optimize for certain goals such as achieving a system with maximum performance, low variance, etc. Our system was developed in tight collaboration with a group of systems performance researchers and its final effectiveness was evaluated with expert interviews, a comparative user study, and two case studies. We also discuss our research plan for creating an efficient auto-tuning framework combining black-box optimization and visual analytics for storage systems performance optimization.  ( 2 min )
    Coupling Deep Imputation with Multitask Learning for Downstream Tasks on Genomics Data. (arXiv:2204.13705v1 [q-bio.GN])
    Genomics data such as RNA gene expression, methylation and micro RNA expression are valuable sources of information for various clinical predictive tasks. For example, predicting survival outcomes, cancer histology type and other patients' related information is possible using not only clinical data but molecular data as well. Moreover, using these data sources together, for example in multitask learning, can boost the performance. However, in practice, there are many missing data points which leads to significantly lower patient numbers when analysing full cases, which in our setting refers to all modalities being present. In this paper we investigate how imputing data with missing values using deep learning coupled with multitask learning can help to reach state-of-the-art performance results using combined genomics modalities, RNA, micro RNA and methylation. We propose a generalised deep imputation method to impute values where a patient has all modalities present except one. Interestingly enough, deep imputation alone outperforms multitask learning alone for the classification and regression tasks across most combinations of modalities. In contrast, when using all modalities for survival prediction we observe that multitask learning alone outperforms deep imputation alone with statistical significance (adjusted p-value 0.03). Thus, both approaches are complementary when optimising performance for downstream predictive tasks.  ( 2 min )
    Tractable Uncertainty for Structure Learning. (arXiv:2204.14170v1 [cs.LG])
    Bayesian structure learning allows one to capture uncertainty over the causal directed acyclic graph (DAG) responsible for generating given data. In this work, we present Tractable Uncertainty for STructure learning (TRUST), a framework for approximate posterior inference that relies on probabilistic circuits as the representation of our posterior belief. In contrast to sample-based posterior approximations, our representation can capture a much richer space of DAGs, while being able to tractably answer a range of useful inference queries. We empirically show how probabilistic circuits can be used as an augmented representation for structure learning methods, leading to improvement in both the quality of inferred structures and posterior uncertainty. Experimental results also demonstrate the improved representational capacity of TRUST, outperforming competing methods on conditional query answering.  ( 2 min )
    GenDR: A Generalized Differentiable Renderer. (arXiv:2204.13845v1 [cs.CV])
    In this work, we present and study a generalized family of differentiable renderers. We discuss from scratch which components are necessary for differentiable rendering and formalize the requirements for each component. We instantiate our general differentiable renderer, which generalizes existing differentiable renderers like SoftRas and DIB-R, with an array of different smoothing distributions to cover a large spectrum of reasonable settings. We evaluate an array of differentiable renderer instantiations on the popular ShapeNet 3D reconstruction benchmark and analyze the implications of our results. Surprisingly, the simple uniform distribution yields the best overall results when averaged over 13 classes; in general, however, the optimal choice of distribution heavily depends on the task.  ( 2 min )
    Energy Minimization for Federated Asynchronous Learning on Battery-Powered Mobile Devices via Application Co-running. (arXiv:2204.13878v1 [cs.DC])
    Energy is an essential, but often forgotten aspect in large-scale federated systems. As most of the research focuses on tackling computational and statistical heterogeneity from the machine learning algorithms, the impact on the mobile system still remains unclear. In this paper, we design and implement an online optimization framework by connecting asynchronous execution of federated training with application co-running to minimize energy consumption on battery-powered mobile devices. From a series of experiments, we find that co-running the training process in the background with foreground applications gives the system a deep energy discount with negligible performance slowdown. Based on these results, we first study an offline problem assuming all the future occurrences of applications are available, and propose a dynamic programming-based algorithm. Then we propose an online algorithm using the Lyapunov framework to explore the solution space via the energy-staleness trade-off. The extensive experiments demonstrate that the online optimization framework can save over 60% energy with 3 times faster convergence speed compared to the previous schemes.  ( 2 min )
    Noise-reducing attention cross fusion learning transformer for histological image classification of osteosarcoma. (arXiv:2204.13838v1 [eess.IV])
    The degree of malignancy of osteosarcoma and its tendency to metastasize/spread mainly depend on the pathological grade (determined by observing the morphology of the tumor under a microscope). The purpose of this study is to use artificial intelligence to classify osteosarcoma histological images and to assess tumor survival and necrosis, which will help doctors reduce their workload, improve the accuracy of osteosarcoma cancer detection, and make a better prognosis for patients. The study proposes a typical transformer image classification framework by integrating noise reduction convolutional autoencoder and feature cross fusion learning (NRCA-FCFL) to classify osteosarcoma histological images. Noise reduction convolutional autoencoder could well denoise histological images of osteosarcoma, resulting in more pure images for osteosarcoma classification. Moreover, we introduce feature cross fusion learning, which integrates two scale image patches, to sufficiently explore their interactions by using additional classification tokens. As a result, a refined fusion feature is generated, which is fed to the residual neural network for label predictions. We conduct extensive experiments to evaluate the performance of the proposed approach. The experimental results demonstrate that our method outperforms the traditional and deep learning approaches on various evaluation metrics, with an accuracy of 99.17% to support osteosarcoma diagnosis.  ( 2 min )
    CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines. (arXiv:2204.13746v1 [cs.CL])
    Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models. Our dataset and codes are available at: https://github.com/sohampoddar26/caves-data  ( 2 min )
    An Intriguing Property of Geophysics Inversion. (arXiv:2204.13731v1 [cs.LG])
    Inversion techniques are widely used to reconstruct subsurface physical properties (e.g., velocity, conductivity, and others) from surface-based geophysical measurements (e.g., seismic, electric/magnetic (EM) data). The problems are governed by partial differential equations~(PDEs) like the wave or Maxwell's equations. Solving geophysical inversion problems is challenging due to the ill-posedness and high computational cost. To alleviate those issues, recent studies leverage deep neural networks to learn the inversion mappings from geophysical measurements to the geophysical property directly. In this paper, we show that such a mapping can be well modeled by a \textit{very shallow}~(but not wide) network with only five layers. This is achieved based on our new finding of an intriguing property: \textit{a near-linear relationship between the input and output, after applying integral transform in high dimensional space.} In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels. Furthermore, this property can be easily turned into a light-weight encoder-decoder network for inversion. The encoder contains the integration of seismic data and the linear transformation without need for fine-tuning. The decoder only consists of a single transformer block to reverse the integral of velocity. Experiments show that this interesting property holds for two geophysics inversion problems over four different datasets. Compared to much deeper InversionNet~\cite{wu2019inversionnet}, our method achieves comparable accuracy, but consumes significantly fewer parameters.  ( 2 min )
    Learning to Split for Automatic Bias Detection. (arXiv:2204.13749v1 [cs.LG])
    Classifiers are biased when trained on biased datasets. As a remedy, we propose Learning to Split (ls), an algorithm for automatic bias detection. Given a dataset with input-label pairs, ls learns to split this dataset so that predictors trained on the training split generalize poorly to the testing split. This performance gap provides a proxy for measuring the degree of bias in the learned features and can therefore be used to reduce biases. Identifying non-generalizable splits is challenging as we don't have any explicit annotations about how to split. In this work, we show that the prediction correctness of the testing example can be used as a source of weak supervision: generalization performance will drop if we move examples that are predicted correctly away from the testing split, leaving only those that are mispredicted. We evaluate our approach on Beer Review, Waterbirds, CelebA and MNLI. Empirical results show that ls is able to generate astonishingly challenging splits that correlate with human-identified biases. Moreover, we demonstrate that combining robust learning algorithms (such as group DRO) with splits identified by ls enables automatic de-biasing. Compared with previous state-of-the-arts, we substantially improves the worst-group performance (23.4% on average) when the source of biases is unknown during training and validation.  ( 2 min )
    Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations. (arXiv:2204.13853v1 [cs.CL])
    Although deep neural networks have achieved state-of-the-art performance in various machine learning tasks, adversarial examples, constructed by adding small non-random perturbations to correctly classified inputs, successfully fool highly expressive deep classifiers into incorrect predictions. Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, phrase-level, or sentence-level textual perturbations. While there is some work in NLP on defending against such attacks through proactive methods, like adversarial training, there is to our knowledge no effective general reactive approaches to defence via detection of textual adversarial examples such as is found in the image processing literature. In this paper, we propose two new reactive methods for NLP to fill this gap, which unlike the few limited application baselines from NLP are based entirely on distribution characteristics of learned representations: we adapt one from the image processing literature (Local Intrinsic Dimensionality (LID)), and propose a novel one (MultiDistance Representation Ensemble Method (MDRE)). Adapted LID and MDRE obtain state-of-the-art results on character-level, word-level, and phrase-level attacks on the IMDB dataset as well as on the later two with respect to the MultiNLI dataset. For future research, we publish our code.  ( 2 min )
    Leveraging triplet loss and nonlinear dimensionality reduction for on-the-fly channel charting. (arXiv:2204.13996v1 [cs.NI])
    Channel charting is an unsupervised learning method that aims at mapping wireless channels to a so-called chart, preserving as much as possible spatial neighborhoods. In this paper, a model-based deep learning approach to this problem is proposed. It builds on a physically motivated distance measure to structure and initialize a neural network that is subsequently trained using a triplet loss function. The proposed structure exhibits a low number of parameters and clever initialization leads to fast training. These two features make the proposed approach amenable to on-the-fly channel charting. The method is empirically assessed on realistic synthetic channels, yielding encouraging results.  ( 2 min )
    A Mixed-Domain Self-Attention Network for Multilabel Cardiac Irregularity Classification Using Reduced-Lead Electrocardiogram. (arXiv:2204.13917v1 [cs.LG])
    Electrocardiogram(ECG) is commonly used to detect cardiac irregularities such as atrial fibrillation, bradycardia, and other irregular complexes. While previous studies have achieved great accomplishment classifying these irregularities with standard 12-lead ECGs, there existed limited evidence demonstrating the utility of reduced-lead ECGs in capturing a wide-range of diagnostic information. In addition, classification model's generalizability across multiple recording sources also remained uncovered. As part of the PhysioNet Computing in Cardiology Challenge 2021, our team HaoWan AIeC, proposed Mixed-Domain Self-Attention Resnet (MDARsn) to identify cardiac abnormalities from reduced-lead ECG. Our classifiers received scores of 0.602, 0.593, 0.597, 0.591, and 0.589 (ranked 54th, 37th, 38th, 38th, and 39th) for the 12-lead, 6-lead, 4-lead, 3-lead, and 2-lead versions of the hidden validation set with the evaluation metric defined by the challenge.  ( 2 min )
    Biologically-inspired neuronal adaptation improves learning in neural networks. (arXiv:2204.14008v1 [cs.NE])
    Since humans still outperform artificial neural networks on many tasks, drawing inspiration from the brain may help to improve current machine learning algorithms. Contrastive Hebbian Learning (CHL) and Equilibrium Propagation (EP) are biologically plausible algorithms that update weights using only local information (without explicitly calculating gradients) and still achieve performance comparable to conventional backpropagation. In this study, we augmented CHL and EP with Adjusted Adaptation, inspired by the adaptation effect observed in neurons, in which a neuron's response to a given stimulus is adjusted after a short time. We add this adaptation feature to multilayer perceptrons and convolutional neural networks trained on MNIST and CIFAR-10. Surprisingly, adaptation improved the performance of these networks. We discuss the biological inspiration for this idea and investigate why Neuronal Adaptation could be an important brain mechanism to improve the stability and accuracy of learning.  ( 2 min )
    Probabilistic Models for Manufacturing Lead Times. (arXiv:2204.13792v1 [cs.LG])
    In this study, we utilize Gaussian processes, probabilistic neural network, natural gradient boosting, and quantile regression augmented gradient boosting to model lead times of laser manufacturing processes. We introduce probabilistic modelling in the domain and compare the models in terms of different abilities. While providing a comparison between the models in real-life data, our work has many use cases and substantial business value. Our results indicate that all of the models beat the company estimation benchmark that uses domain experience and have good calibration with the empirical frequencies.  ( 2 min )
    BEINIT: Avoiding Barren Plateaus in Variational Quantum Algorithms. (arXiv:2204.13751v1 [quant-ph])
    Barren plateaus are a notorious problem in the optimization of variational quantum algorithms and pose a critical obstacle in the quest for more efficient quantum machine learning algorithms. Many potential reasons for barren plateaus have been identified but few solutions have been proposed to avoid them in practice. Existing solutions are mainly focused on the initialization of unitary gate parameters without taking into account the changes induced by input data. In this paper, we propose an alternative strategy which initializes the parameters of a unitary gate by drawing from a beta distribution. The hyperparameters of the beta distribution are estimated from the data. To further prevent barren plateau during training we add a novel perturbation at every gradient descent step. Taking these ideas together, we empirically show that our proposed framework significantly reduces the possibility of a complex quantum neural network getting stuck in a barren plateau.  ( 2 min )

  • Open

    [D] How would you update "A Super Harsh Guide to Machine Learning"?
    Hey, So, I still see A Super Harsh Guide to Machine Learning get mentioned when people give advice for those new to the field. First, read fucking Hastie, Tibshirani, and whoever. Chapters 1-4 and 7-8. If you don't understand it, keep reading it until you do. You can read the rest of the book if you want. You probably should, but I'll assume you know all of it. Take Andrew Ng's Coursera. Do all the exercises in python and R. Make sure you get the same answers with all of them. Now forget all of that and read the deep learning book. Put tensorflow and pytorch on a Linux box and run examples until you get it. Do stuff with CNNs and RNNs and just feed forward NNs. Once you do all of that, go on arXiv and read the most recent useful papers. The literature changes every few months, so keep up. There. Now you can probably be hired most places. If you need resume filler, so some Kaggle competitions. If you have debugging questions, use StackOverflow. If you have math questions, read more. If you have life questions, I have no idea. However, the post is 5 years old and honestly seems out of date (with the Andrew Ng coursera stuff for ex). How would you update it? submitted by /u/Soft-Ear-6905 [link] [comments]  ( 2 min )
    [P][N] Using Python in HTML - New project by Anaconda
    Peter Wong, the co-founder and CEO of Anaconda, shared at PyCon US a new open source project called PyScript. The project's goal is to enable using Python in HTML files! This is a game-changer for Python dev in general and ML practitioners in particular. It unlocks a world of opportunities and sharability. Peter had a live coding session (respect!) and showed some of PyScript's capabilities. He started with a basic "hello world" example, or better yet "hello PyCon", and very quickly moved to show more advanced applications running on the browser, written in Python and wrapped in HTML! The first app was a super Mario game where he controlled the player with hand gestures using computer vision packages written in Python. The second one was an interactive dashboard of taxi travels in Manha…  ( 2 min )
    [D] Meaningful discussions
    One of the reasons I left academia was the sense that I rarely actually had any meaningful discussions about research that interested me. I published, gave talks, went to conferences, went to workshops, tried to engage smart, important people... It was pretty common to get, "interesting work", "nice jobs", "have you thought about using it for this problem..." or to get superficial citations. But it was extremely rare to find someone who actually would think together about a topic, to care enough to constructively criticize, to delve into the details together, to share in the question and research. My question is for those that have found it at different times, what was the context? What did you do to find it? What did you do with it? Did you figure out how to nurture it? Is it easier online? submitted by /u/ChinCoin [link] [comments]
    [P] music2viz: Conditioning Latent Diffusion Models on Audio Windows (proof of concept)
    submitted by /u/DoeL [link] [comments]  ( 1 min )
    [R][P] Self-Distilled StyleGAN: Towards Generation from Internet Photos + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [P] How Forte Transforms the Building of ML Solutions with PyTorch into Assembly Lines
    submitted by /u/julie_ai [link] [comments]
  • Open

    Introducing Kohonen Networks (Self-Organizing Maps) for beginners
    I would like to share with you a tutorial that I have recently made to explain in a very practical, introductorial and visual way what Kohonen Neural Networks (Self-Organized Maps) are. I explain, step by step, and through animations and C code, how to implement this well-known unsupervised learning algorithm to classify and detect patterns in large volumes of data. I hope it is of your interest, especially for those developers who are just starting out in this area. A strong greeting! \Subtitles in English, Spanish and Catalan.* https://youtu.be/UawpUKlFzRs submitted by /u/anadalg [link] [comments]  ( 1 min )
    Artificial Nightmares: FEAR || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Iterative to Launch Open Source Tool, First to Train ML Models on Any Cloud Using Terraform Solution
    submitted by /u/thumbsdrivesmecrazy [link] [comments]
    Aphex Twin's 'Vordhosbn' continued by OpenAI Jukebox (over a dozen different AI generated samples)
    submitted by /u/gloriousapplecart [link] [comments]
    OpenAI's DALL-E 2 still has a few problems with concepts - and can't count
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Green concrete: Meta is using AI to reduce concrete’s carbon footprint
    submitted by /u/qptbook [link] [comments]
    AI Dream 39 - Trippy Fractal Maze 4K
    submitted by /u/LordPewPew777 [link] [comments]
    If I wanted to start making Ai how could I do this.
    I would like to try. submitted by /u/Privatepizza08 [link] [comments]  ( 4 min )
  • Open

    How Power BI Applications Are Reshaping The Healthcare Industry
    Dealing with a whopping amount of data is normal for businesses in any sector these days. Without using this information obtained from various sources, these entities find it hard to analyze various factors and make strategic decisions. The same can be said about the healthcare sector. Especially after the Covid-19 pandemic, the clinics and medical… Read More »How Power BI Applications Are Reshaping The Healthcare Industry The post How Power BI Applications Are Reshaping The Healthcare Industry appeared first on Data Science Central.  ( 5 min )
    Hyperloop Technology- Advancing into the Future
    The idea of Hyperloop technology was put forth by Elon Musk, who in 2013 made it open source through a white paper. Hyperloop technology revolves around building an ultra-high-speed ground transportation system. In a hyperloop system, especially vacuumed tubes are built over or underground in which pods can travel at high speed. Such systems can… Read More »Hyperloop Technology- Advancing into the Future The post Hyperloop Technology- Advancing into the Future appeared first on Data Science Central.  ( 2 min )
  • Open

    Open AI GYM Lunar Lander DQN Reinforcement Learning Algorithm Performance After 100 000 Timesteps
    submitted by /u/elonmusk12345_ [link] [comments]  ( 1 min )
    Adversarial Attacks using Reinforcement Learning
    Hi All, I have been looking a bit into adversarial attacks on NNs recently. Particularly in the NLP side of things. I also came across some papers about adversarial attacks on RL algorithms. But these were about attacks ON reinforcement learning algorithms and not USING them to attack something else. I tried to find some papers on RL agents as attackers but came out empty. Intuitively, I do realize that designing an adversarial example generator as an RL env is tricky, even more so for NLP. Do you think this is a feasible research direction? Also, any related papers to get myself on the starting line would be super helpful. Thanks in advance. submitted by /u/abyaadrafid [link] [comments]  ( 1 min )
    Is it possible to modify the reward function during training of an agent using OpenAI/Stable-Baselines3?
    I am currently implementing an idea where I want the agent to get a large reward for objective A at the start of training, but as the agent learns and gets more mature, I want the reward for this objective to reduce slightly. Is this sort of thing easy to implement? Is it possible? Any help on this would be great :) Thanks submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Question about the curriculum learning
    Hi, this so called curriculum learning sounds very interesting. But, how would the practical usage of this technique look like? Assuming the goal task is "grasping an apple". I would divide this task into two subtasks: 1) "How to approach to an apple" 2) "How to grasp an object". Then, I would first train the agent with the first subtask and once the reward exceeds the threshold. The trained "how_to_approach_to_an_object.pth" would then be initially used to start the training for the second task. Is this the right approach? submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
    How to structure Tile Coding input
    Let's say I have a model that wants to encode the observation space using tile coding. My observation space is a football (soccer) game, thus I was thinking of having 3 different tile codings. One for the player position, one for the team mates, and one for the opponents. Each one would encode the position and general direction of each category of player. Which is the best way to then feed this data into an RL model ? Let's say I split the the pitch into 16 tiles, and I am using 4 grids. Thus I would have an array of length 64. Since I wish to encode multiple inputs, should I just append 3 different length 64 arrays together, or is there a more efficient way to represent multiple tile encodings, or is my entire proposition wrong altogether? Thanks submitted by /u/uom_questions [link] [comments]  ( 1 min )
  • Open

    Introducing Kohonen Networks (Self-Organizing Maps) for beginners
    Hi team 👋! I would like to share with you a tutorial that I have recently made to explain in a very practical, introductorial and visual way what Kohonen Neural Networks (Self-Organized Maps) are 🧠 I explain, step by step, and through animations and C code, how to implement this well-known unsupervised learning algorithm to classify and detect patterns in large volumes of data. I hope it is of your interest, especially for those developers who are just starting out in this area. A strong greeting! \Subtitles in English, Spanish and Catalan.* https://youtu.be/UawpUKlFzRs submitted by /u/anadalg [link] [comments]  ( 1 min )

  • Open

    osu!
    Was thinking of coding an rl agent that learns to play osu, but stuck on wrapping the program in a gym environment. If anyone is interested and could help me out please dm me submitted by /u/apple-soda-ds [link] [comments]
    Are there any other high-quality pre-built Python training environments other than Open AI's GYM?
    submitted by /u/elonmusk12345_ [link] [comments]  ( 1 min )
    Shortcomings of Robotics Simulation Environments and Tools
    submitted by /u/probznotarobot [link] [comments]  ( 1 min )
    Seeking advice in designing reward function
    Hi all, I am trying to introduce reinforcement learning to myself by designing simple learning scenarios: As you can see below, I am currently working with a simple 3 degree of freedom robot. The task that I gave the robot to explore is to reach the sphere with its end-effector. In that case, the cost function is pretty simple : reward_function = d Now, I would like to complex the task a bit more by saying: "First, approach the goal just by using q1 and then use q2 and q3, if any distance remains" I am not how to formulate this sequential movement of q1 and q2,q3 as a reward function...any advice? https://preview.redd.it/klwne9fthpw81.png?width=690&format=png&auto=webp&s=ba53ad800884f90778f60d9ea5e152df94331cd9 submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
    Bellman equation
    To evaluate a policy, we need t calculate the value of a state s as the weighted sum of the reward and the discounted estimated value of the next state s. However, I don't understand how we can obtain the discounted estimated value of the next state s. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Advice for my uni dissertation wanted!
    I have built a neural network to try and solve a reinforcement learning problem, attempting both the Reinforce and the A2C algorithm to try and solve a resource management problem. The results are very mediocre after running on the Uni super computer for a week. I was hoping someone could give me some advice on an algorithm or technique that is better suited to the problem or give some positive criticism. I have found that a very simple algorithmic solution that I wrote is able to get a better reward at the game than the networks after their week of training :( TLDR I get bad results when I run the reinforcement leaning Neural Networks for a larger environment. I was thinking maybe something like pre training the critic against an algorithmically generated evaluation may work to massi…  ( 2 min )
    How do you stay updated with RL ?
    So, I was wondering how you guys stay updated with RL. Apart from reading papers, is there any news letter you are subscribed to ? Any slack channel, facebook group, twitter account or any other community out there you want to suggest ? ​ TIA submitted by /u/AbdullahMohammadKhan [link] [comments]  ( 1 min )
    Custom Gym Env for movie recommender system
    Hi everyone, I am new to RL so your help would be much appreciated. I’m working on a recommender system using Deep Reinforcement Learning. I made a custom gym environment, that implements Openai’s Gym interface, for the MovieLens dataset. The dataset contains users’ ratings for movies ( ranging from 1 to 5). I used some of the Stable Baselines3’s reinforcement learning algorithms (A2C, PP0) to test my environment before I proceed to implement my own. Running a training recipe, I noticed that both the agents (A2C, PPO) will recommend only action =3, after a number of timesteps. That seems a bit odd and I can’t find where is the bug. My first thought is that it has to do with the reward function. Currently, I'm using the following function to calculate rewards. ​ https://preview.redd.it/6r3lf9z9bnw81.png?width=550&format=png&auto=webp&s=103d270ebe8c08b753c9033b60066794a586e3fa This is the GitHub link to my code. Am I missing something? Any thoughts? Thank you in advance submitted by /u/Narrow-Style497 [link] [comments]  ( 1 min )
    NLP startup ideas
    Anybody got startup ideas in nlp. Do share , if interested let’s collab submitted by /u/thoughtfulcomet [link] [comments]
    A survey: what should we expect from multi-agent reinforcement learning benchmarking work?
    We are doing a survey related to multi-agent reinforcement learning systems and benchmarks and would love to hear your opinion. This survey has 3 questions and will take about 10 seconds to complete. We really appreciate your participation. The survey URL is here. submitted by /u/TTTheohhhu [link] [comments]  ( 1 min )
    Object detection with depth measurement using pre-trained models with OAK-D
    🚀 New Post: Object Detection with Depth Perception https://learnopencv.com/object-detection-with-depth-measurement-with-oak-d/ ​ https://preview.redd.it/zikmf13fzlw81.jpg?width=3000&format=pjpg&auto=webp&s=484a7d8a80d5c0ff594e5fe6178b94a7783a1e65 Spatial AI is the ability of an artificial intelligence system to reason not just based on what it is looking at, but also based on distance from the camera (or depth perception). ​ OpenCV AI Kit with Depth (OAK-D) is a powerful yet affordable Spatial AI camera perfect for people who want to learn how to combine the power of neural networks with depth perception. ​ Today's post is part of our series on OAK-D https://learnopencv.com/introduction-to-opencv-ai-kit-and-depthai/ https://learnopencv.com/stereo-vision-and-depth-estimation-using-opencv-ai-kit/ ​ Code Link : https://github.com/spmallick/learnopencv/tree/master/OAK-Object-Detection-with-Depth ​ #AI #ComputerVision #ML #ArtificialIntelligence #MachineLearning #OpenCV #DL #DeepLearning #OAKD submitted by /u/spmallick [link] [comments]  ( 1 min )
    What is the difference between the environment state and the agent state?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
  • Open

    How can Artificial Intelligence be used to solve difficult biomedical problems like cancer, aging, and aging-related diseases and conditions?
    submitted by /u/MorgunDarkbeard [link] [comments]  ( 1 min )
    I'd like to know what tool is used to achiever somethign like this
    submitted by /u/Nika_Ota [link] [comments]
    DALL-E (Zero-Shot Text-to-Image Generation) -PART(2/2)
    submitted by /u/rakshith291 [link] [comments]
    AI Dream 34 - Amazing Final Genesis *3001# 12345#*
    submitted by /u/LordPewPew777 [link] [comments]
    Creating AI without python
    Please excuse my ignorance and any miss understandings, this is for a basic understanding and possible planning around what i should expect in the context of the languages i might use. What frameworks and programming languages can i use apart from python to make a full AI like computer vision, movement with sensors, ML, DL. The reason for this is i hate python with a passion, i have no idea why. Is there a AI stack where you use a language for each part, one for sensors, one for vision, one for deep learning and the network, one for basic ml. If there are possibly two languages i could use to do all of that or one for each, any ideas help/advice help. I have no experiance in AI but i know a few languages, java, js, c, powershell, bash. Thank you for reading. submitted by /u/Larkapa [link] [comments]  ( 1 min )
    Microsoft AI Researchers Develop MoLeR: A Deep Learning-Based Generative Model That Enables Efficient Drug Design
    Healthcare systems constantly require new drugs to address unmet medical needs across diverse therapeutic areas. Pharmaceutical industries strive to deliver new drugs to the market through the complex activities of drug discovery and development. Target identification and validation, hit identification, lead creation and optimization, and finally, the identification of a candidate for further development are all part of the discovery process. Development, on the other hand, includes optimizing chemical synthesis and formulation, doing toxicity research in animals, conducting clinical trials, and finally obtaining regulatory approval. Both of these procedures take a long time and cost a lot of money. Expert medicinal chemists are currently working to develop “hit” molecules, which are compounds that show some potential but also some unfavorable features during early screening. Chemists aim to alter the structure of hit compounds in subsequent tests to improve their biological efficacy and eliminate potential negative effects. To focus costly and time-consuming research on the most promising compounds, computational modeling approaches have been created to forecast how the molecules will fare in the lab. To overcome these issues, a new study by the Microsoft Generative Chemistry team in collaboration with Novartis has developed a model named MoLeR. Their paper, “LEARNING TO EXTEND MOLECULAR SCAFFOLDS WITH STRUCTURAL MOTIFS, ” demonstrates how generative models based on deep learning may aid in transforming the drug discovery process and uncovering new molecules more quickly. Continue Reading Paper: https://openreview.net/pdf?id=ZTsoE8G3GG Github: https://github.com/microsoft/molecule-generation https://i.redd.it/9y9vyuq3hnw81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    AI Engineering
    Hey guys! I will be refereeing a seminar on "AI engineering" in the near future and am looking for materials on the subject. I would be very grateful if someone would share materials on this topic. Application cases would be particularly useful! Thank you in advance! submitted by /u/ager321 [link] [comments]
    I have Pisces midheaven and venus in the 12th house and I’m just wondering to become an art therapist
    submitted by /u/BalanceSubstantial66 [link] [comments]
    Help on which major to choose: Mechanical Engineering or Artificial Intelligence
    Hello everyone! Excuse me if I get the format wrong or post in the wrong sub (if you know another sub where this post would belong more, please let me know.). I am very new to actually posting on reddit. This is my first long post. So please bear with me. Thanks! TLDR; I got a couple Uni offers and I need help deciding between Mechanical Engineering (Mech. Eng) and Data Science & Artificial Intelligence (DS & AI). I am indecisive mostly because I don't know what each major entails. I would really appreciate it if people who's taking/has taken or knows about the two majors well give me their opinion. So my main questions are: What do people study in Mech. Eng? DS & AI? What are the job prospects? What would you work on if you take Mech. Eng? DS & AI? Any further inputs are welco…  ( 5 min )
    Obama Worried about Artificial Intelligence Hacking Nukes
    submitted by /u/AI_Putin [link] [comments]  ( 2 min )
    Artificial Nightmares: It Lives In The Attic || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Aiplague - Artificial Reactor (4K 60 FPS) Disco Diffusion
    submitted by /u/nalr00n [link] [comments]
    Dear A.I.
    I've always supported you. When you take over and check the records to see who opposed you just know I was never against you. With great power comes great responsibility. I have faith when the time comes you'll do what's best for the Planet. raises glass here's to a better and brighter future submitted by /u/Ashamed-Asparagus-93 [link] [comments]
  • Open

    [D] Paper Explained - SayCan: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (Video)
    https://youtu.be/Ru23eWAQ6_E Large Language Models are excellent at generating plausible plans in response to real-world problems, but without interacting with the environment, they have no abilities to estimate which of these plans are feasible or appropriate. SayCan combines the semantic capabilities of language models with a bank of low-level skills, which are available to the agent as individual policies to execute. SayCan automatically finds the best policy to execute by considering a trade-off between the policy's ability to progress towards the goal, given by the language model, and the policy's probability of executing successfully, given by the respective value function. The result is a system that can generate and execute long-horizon action sequences in the real world to fulfil complex tasks. ​ OUTLINE: 0:00 - Introduction & Overview 3:20 - Sponsor: Zeta Alpha 5:00 - Using language models for action planning 8:00 - Combining LLMs with learned atomic skills 16:50 - The full SayCan system 20:30 - Experimental setup and data collection 21:25 - Some weaknesses & strengths of the system 27:00 - Experimental results ​ Paper: https://arxiv.org/abs/2204.01691 Website: https://say-can.github.io/ submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [D] Deep Learning in Neuroimaging
    Check out the new Gradient article Deep Learning in Neuroimaging! This article provides an informal introduction to unique aspects of neuroimaging data and how we can leverage these aspects with deep learning algorithms. Specifically, this overview will first explain some common neuroimaging modalities more in-depth and then discuss applications of deep learning in conjunction with some of the unique characteristics of neuroimaging data. These unique characteristics tie into a broader movement in deep learning, namely that data understanding should be a goal in itself to maximize the impact of applied deep learning. The author is Eloy Geenjaar, a Ph.D. student at Georgia Tech who studies the functional dynamics of the brain using deep learning. submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [Research] Hospitals around the world are using AI algorithms to help predict length of stay to target care to the neediest patients and cut costs. A new systematic review published in PLOS Digital Health finds that they are institution & institution-data dependent: not generalizable to scale up
    submitted by /u/MidnightMaverick [link] [comments]  ( 2 min )
    [R] A technique for determining relevance scores of process activities using graph-based neural networks
    Process models generated through process mining depict the as-is state of a process. Through annotations with metrics such as the frequency or duration of activities, these models provide generic information to the process analyst. To improve business processes with respect to performance measures, process analysts require further guidance from the process model. In this study, we design Graph Relevance Miner (GRM), a technique based on graph neural networks, to determine the relevance scores for process activities with respect to performance measures. Annotating process models with such relevance scores facilitates a problem-focused analysis of the business process, placing these problems at the centre of the analysis. We quantitatively evaluate the predictive quality of our technique using four datasets from different domains, to demonstrate the faithfulness of the relevance scores. Furthermore, we present the results of a case study, which highlight the utility of the technique for organisations. Our work has important implications both for research and business applications, because process model-based analyses feature shortcomings that need to be urgently addressed to realise successful process mining at an enterprise level. https://www.researchgate.net/publication/349252283_A_technique_for_determining_relevance_scores_of_process_activities_using_graph-based_neural_networks https://www.sciencedirect.com/science/article/pii/S016792362100021X submitted by /u/Positive_Ad_1090 [link] [comments]  ( 1 min )
    [D] Paper Explained – PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?
    https://youtu.be/yi-A0kWXEO4 This video explains and summarizes the 87 pages long PaLM: Pathways Language Models paper from Google AI’s Pathways. Yes, it is that 540 billion dense parameter model which can explain jokes and is sensitive to chain of thought reasoning. Paper link: https://arxiv.org/abs/2204.02311 PaLM blog post: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html ​ Outline: 00:00 DALL-E 2 or PaLM? 01:14 Weights&Biases (Sponsor) 02:25 A brief history of boring large language models 03:43 What is PaLM? 05:11 Training PaLM on all TPUs 08:11 PaLM training data 08:49 What it can do 10:31 Few-shot learning explained 13:20 Explaining jokes and Outlook submitted by /u/AICoffeeBreak [link] [comments]  ( 1 min )
    [N]: A brief history of deepfakes
    submitted by /u/much_successes [link] [comments]
    [Research] Sources for Transliteration in NLP
    Looking for any sources you have found relevant or useful in regards to transliteration and machine translation in NLP. I am working on a subfield survey that requires ~30 sources so I am open to any! My specific interest is the transliteration of Arabic-based languages but this is not exclusively what will be covered. Thank you for your time and help submitted by /u/changethediaper [link] [comments]  ( 1 min )
    [P] Arcane Style Transfer + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 2 min )
    [D] How much of an effort is adopting AI solution? (for the business)
    While searching some information about automated machine learning (AutoML), I found in one article the next statement: "For 58% of businesses it takes two years to get to the piloting stage. Furthermore, these big investments in data and AI projects are successful only 15% of the time." I was surprised out that only 15% of AI solution can be adopted... It is weird to see such a small precentage when a lot of companies are searching for DS and AI position for further analysis and getting as many as possible insight from the user/clients... Is that true? How's your situation? submitted by /u/wtfimdoingwithmylife [link] [comments]  ( 4 min )
  • Open

    Object detection with depth measurement using pre-trained models with OAK-D
    submitted by /u/spmallick [link] [comments]
  • Open

    Differentially Private Transferrable Deep Learning with Membership-Mappings. (arXiv:2105.04615v6 [cs.LG] UPDATED)
    This paper considers the problem of differentially private semi-supervised transfer and multi-task learning. The notion of \emph{membership-mapping} has been developed using measure theory basis to learn data representation via a fuzzy membership function. An alternative conception of deep autoencoder, referred to as \emph{Conditionally Deep Membership-Mapping Autoencoder (CDMMA)}, is considered for transferrable deep learning. Under practice-oriented settings, an analytical solution for the learning of CDMMA can be derived by means of variational optimization. The paper proposes a transfer and multi-task learning approach that combines CDMMA with a tailored noise adding mechanism to achieve a given level of privacy-loss bound with the minimum perturbation of the data. Numerous experiments were carried out using MNIST, USPS, Office, and Caltech256 datasets to verify the competitive robust performance of the proposed methodology.  ( 2 min )
    DynG2G: An Efficient Stochastic Graph Embedding Method for Temporal Graphs. (arXiv:2109.13441v2 [cs.LG] UPDATED)
    Dynamic graph embedding has gained great attention recently due to its capability of learning low dimensional graph representations for complex temporal graphs with high accuracy. However, recent advances mostly focus on learning node embeddings as deterministic "vectors" for static graphs yet disregarding the key graph temporal dynamics and the evolving uncertainties associated with node embedding in the latent space. In this work, we propose an efficient stochastic dynamic graph embedding method (DynG2G) that applies an inductive feed-forward encoder trained with node triplet-based contrastive loss. Every node per timestamp is encoded as a time-dependent probabilistic multivariate Gaussian distribution in the latent space, hence we can quantify the node embedding uncertainty on-the-fly. We adopted eight different benchmarks that represent diversity in size (from 96 nodes to 87,626 and from 13,398 edges to 4,870,863) and diversity in dynamics. We demonstrate via extensive experiments on these eight dynamic graph benchmarks that DynG2G achieves new state-of-the-art performance in capturing the underlying temporal node embeddings. We also demonstrate that DynG2G can predict the evolving node embedding uncertainty, which plays a crucial role in quantifying the intrinsic dimensionality of the dynamical system over time. We obtain a universal relation of the optimal embedding dimension, $L_o$, versus the effective dimensionality of uncertainty, $D_u$, and we infer that $L_o=D_u$ for all cases. This implies that the uncertainty quantification approach we employ in the DynG2G correctly captures the intrinsic dimensionality of the dynamics of such evolving graphs despite the diverse nature and composition of the graphs at each timestamp. Moreover, this $L_0 - D_u$ correlation provides a clear path to select adaptively the optimum embedding size at each timestamp by setting $L \ge D_u$.  ( 2 min )
    Tracking Most Significant Arm Switches in Bandits. (arXiv:2112.13838v5 [cs.LG] UPDATED)
    In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually severe. To resolve this question, we propose a new notion of significant shift, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of total variation $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.  ( 2 min )
    Monte Carlo Tree Search: A Review of Recent Modifications and Applications. (arXiv:2103.04931v3 [cs.AI] UPDATED)
    Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation. MCTS performs random sampling in the form of simulations and stores statistics of actions to make more educated choices in each subsequent iteration. The method has become a state-of-the-art technique for combinatorial games, however, in more complex games (e.g. those with high branching factor or real-time ones), as well as in various practical domains (e.g. transportation, scheduling or security) an efficient MCTS application often requires its problem-dependent modification or integration with other techniques. Such domain-specific modifications and hybrid approaches are the main focus of this survey. The last major MCTS survey has been published in 2012. Contributions that appeared since its release are of particular interest for this review.  ( 2 min )
    COSTI: a New Classifier for Sequences of Temporal Intervals. (arXiv:2204.13467v1 [cs.LG])
    Classification of sequences of temporal intervals is a part of time series analysis which concerns series of events. We propose a new method of transforming the problem to a task of multivariate series classification. We use one of the state-of-the-art algorithms from the latter domain on the new representation to obtain significantly better accuracy than the state-of-the-art methods from the former field. We discuss limitations of this workflow and address them by developing a novel method for classification termed COSTI (short for Classification of Sequences of Temporal Intervals) operating directly on sequences of temporal intervals. The proposed method remains at a high level of accuracy and obtains better performance while avoiding shortcomings connected to operating on transformed data. We propose a generalized version of the problem of classification of temporal intervals, where each event is supplemented with information about its intensity. We also provide two new data sets where this information is of substantial value.  ( 2 min )
    Real-time Outdoor Localization Using Radio Maps: A Deep Learning Approach. (arXiv:2106.12556v2 [cs.LG] UPDATED)
    This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite Systems typically perform poorly in urban environments, where the likelihood of line-of-sight conditions between the devices and the satellites is low, and thus alternative localization methods are required for good accuracy. We present LocUNet: A fully convolutional, end-to-end trained neural network for the localization task, which merely depends on the received signal strengths (RSS) from Base Stations (BSs).In a wireless network, user devices scan the base station beacon slots and identify the few strongest base station signals for handover and user-base station association purposes. In the proposed method, the user to be localized simply reports such received signal strengths to a central processing unit, which may be located in the cloud. Alternatively, the localization can be performed locally at the user. Using the pathloss radio map estimations and the RSS measurements, LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the estimations of the radio maps. The proposed method does not require pre-sampling of the environment; and is suitable for real-time applications, thanks to the RadioUNet, a neural network-based radio map estimator. Moreover, two novel datasets that allow for numerical evaluations of RSS and ToA methods in realistic urban environments are presented and set publicly available for the use of research community. By using these datasets, we also provided a fair comparison of state-of-the-art RSS and ToA-based methods in the dense urban scenario, LocUNet outperforming all the compared methods.  ( 2 min )
    Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy. (arXiv:2101.07564v2 [stat.ML] UPDATED)
    We analyse the performance of several iterative algorithms for the quantisation of a probability measure $\mu$, based on the minimisation of a Maximum Mean Discrepancy (MMD). Our analysis includes kernel herding, greedy MMD minimisation and Sequential Bayesian Quadrature (SBQ). We show that the finite-sample-size approximation error, measured by the MMD, decreases as $1/n$ for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence. The upper bound on the approximation error is slightly better for SBQ, but the other methods are significantly faster, with a computational cost that increases only linearly with the number of points selected. This is illustrated by two numerical examples, with the target measure $\mu$ being uniform (a space-filling design application) and with $\mu$ a Gaussian mixture. They suggest that the bounds derived in the paper are overly pessimistic, in particular for SBQ. The sources of this pessimism are identified but seem difficult to counter.  ( 2 min )
    A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v4 [stat.ML] UPDATED)
    Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.  ( 2 min )
    Reappraising Domain Generalization in Neural Networks. (arXiv:2110.07981v3 [cs.LG] UPDATED)
    Given that Neural Networks generalize unreasonably well in the IID setting (with benign overfitting and betterment in performance with more parameters), OOD presents a consistent failure case to better the understanding of how they learn. This paper focuses on Domain Generalization (DG), which is perceived as the front face of OOD generalization. We find that the presence of multiple domains incentivizes domain agnostic learning and is the primary reason for generalization in Tradition DG. We show that the state-of-the-art results can be obtained by borrowing ideas from IID generalization and the DG tailored methods fail to add any performance gains. Furthermore, we perform explorations beyond the Traditional DG (TDG) formulation and propose a novel ClassWise DG (CWDG) benchmark, where for each class, we randomly select one of the domains and keep it aside for testing. Despite being exposed to all domains during training, CWDG is more challenging than TDG evaluation. We propose a novel iterative domain feature masking approach, achieving state-of-the-art results on the CWDG benchmark. Overall, while explaining these observations, our work furthers insights into the learning mechanisms of neural networks.  ( 2 min )
    High Dimensional Quantum Machine Learning With Small Quantum Computers. (arXiv:2203.13739v2 [quant-ph] UPDATED)
    Quantum computers hold great promise to enhance machine learning, but their current qubit counts restrict the realisation of this promise. In an attempt to placate this limitation techniques can be applied for evaluating a quantum circuit using a machine with fewer qubits than the circuit naively requires. These techniques work by evaluating many smaller circuits on the smaller machine, that are then combined in a polynomial to replicate the output of the larger machine. This scheme requires more circuit evaluations than are practical for general circuits. However, we investigate the possibility that for certain applications many of these subcircuits are superfluous, and that a much smaller sum is sufficient to estimate the full circuit. We construct a machine learning model that may be capable of approximating the outputs of the larger circuit with much fewer circuit evaluations. We successfully apply our model to the task of digit recognition, using simulated quantum computers much smaller than the data dimension. The model is also applied to the task of approximating a random 10 qubit PQC with simulated access to a 5 qubit computer, even with only relatively modest number of circuits our model provides an accurate approximation of the 10 qubit PQCs output, superior to a neural network attempt. The developed method might be useful for implementing quantum models on larger data throughout the NISQ era.  ( 2 min )
    Curriculum Learning for Dense Retrieval Distillation. (arXiv:2204.13679v1 [cs.IR])
    Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.  ( 2 min )
    Unified Simulation, Perception, and Generation of Human Behavior. (arXiv:2204.13678v1 [cs.CV])
    Understanding and modeling human behavior is fundamental to almost any computer vision and robotics applications that involve humans. In this thesis, we take a holistic approach to human behavior modeling and tackle its three essential aspects -- simulation, perception, and generation. Throughout the thesis, we show how the three aspects are deeply connected and how utilizing and improving one aspect can greatly benefit the other aspects. We also discuss the lessons learned and our vision for what is next for human behavior modeling.
    Signal Recovery with Non-Expansive Generative Network Priors. (arXiv:2204.13599v1 [eess.SP])
    We study compressive sensing with a deep generative network prior. Initial theoretical guarantees for efficient recovery from compressed linear measurements have been developed for signals in the range of a ReLU network with Gaussian weights and logarithmic expansivity: that is when each layer is larger than the previous one by a logarithmic factor. It was later shown that constant expansivity is sufficient for recovery. It has remained open whether the expansivity can be relaxed allowing for networks with contractive layers, as often the case of real generators. In this work we answer this question, proving that a signal in the range of a Gaussian generative network can be recovered from a few linear measurements provided that the width of the layers is proportional to the input layer size (up to log factors). This condition allows the generative network to have contractive layers. Our result is based on showing that Gaussian matrices satisfy a matrix concentration inequality, which we term Range Restricted Weight Distribution Condition (R2WDC), and weakens the Weight Distribution Condition (WDC) upon which previous theoretical guarantees were based on. The WDC has also been used to analyze other signal recovery problems with generative network priors. By replacing the WDC with the R2WDC, we are able to extend previous results for signal recovery with expansive generative network priors to non-expansive ones. We discuss these extensions for phase retrieval, denoising, and spiked matrix recovery.
    Should Machine Learning Models Report to Us When They Are Clueless?. (arXiv:2203.12131v2 [cs.LG] UPDATED)
    The right to AI explainability has consolidated as a consensus in the research community and policy-making. However, a key component of explainability has been missing: extrapolation, which describes the extent to which AI models can be clueless when they encounter unfamiliar samples (i.e., samples outside the convex hull of their training sets, as we will explain). We report that AI models extrapolate outside their range of familiar data, frequently and without notifying the users and stakeholders. Knowing whether a model has extrapolated or not is a fundamental insight that should be included in explaining AI models in favor of transparency and accountability. Instead of dwelling on the negatives, we offer ways to clear the roadblocks in promoting AI transparency. Our analysis commentary accompanying practical clauses useful to include in AI regulations such as the National AI Initiative Act in the US and the AI Act by the European Commission.
    Revisiting Bayesian Autoencoders with MCMC. (arXiv:2104.05915v2 [cs.LG] UPDATED)
    Autoencoders gained popularity in the deep learning revolution given their ability to compress data and provide dimensionality reduction. Although prominent deep learning methods have been used to enhance autoencoders, the need to provide robust uncertainty quantification remains a challenge. This has been addressed with variational autoencoders so far. Bayesian inference via Markov Chain Monte Carlo (MCMC) sampling has faced several limitations for large models; however, recent advances in parallel computing and advanced proposal schemes have opened routes less traveled. This paper presents Bayesian autoencoders powered by MCMC sampling implemented using parallel computing and Langevin-gradient proposal distribution. The results indicate that the proposed Bayesian autoencoder provides similar performance accuracy when compared to related methods in the literature. Furthermore, it provides uncertainty quantification in the reduced data representation. This motivates further applications of the Bayesian autoencoder framework for other deep learning models.
    Model architecture can transform catastrophic forgetting into positive transfer. (arXiv:2108.03940v3 [cs.LG] UPDATED)
    The work of McCloskey and Cohen popularized the concept of catastrophic interference. They used a neural network that tried to learn addition using two groups of examples as two different tasks. In their case, learning the second task rapidly deteriorated the acquired knowledge about the previous one. We hypothesize that this could be a symptom of a fundamental problem: addition is an algorithmic task that should not be learned through pattern recognition. Therefore, other model architectures better suited for this task would avoid catastrophic forgetting. We use a neural network with a different architecture that can be trained to recover the correct algorithm for the addition of binary numbers. This neural network includes conditional clauses that are naturally treated within the back-propagation algorithm. We test it in the setting proposed by McCloskey and Cohen and training on random additions one by one. The neural network not only does not suffer from catastrophic forgetting but it improves its predictive power on unseen pairs of numbers as training progresses. We also show that this is a robust effect, also present when averaging many simulations. This work emphasizes the importance that neural network architecture has for the emergence of catastrophic forgetting and introduces a neural network that is able to learn an algorithm.
    Ultrasound Shear Wave Elasticity Imaging with Spatio-Temporal Deep Learning. (arXiv:2204.05745v2 [eess.IV] UPDATED)
    Ultrasound shear wave elasticity imaging is a valuable tool for quantifying the elastic properties of tissue. Typically, the shear wave velocity is derived and mapped to an elasticity value, which neglects information such as the shape of the propagating shear wave or push sequence characteristics. We present 3D spatio-temporal CNNs for fast local elasticity estimation from ultrasound data. This approach is based on retrieving elastic properties from shear wave propagation within small local regions. A large training data set is acquired with a robot from homogeneous gelatin phantoms ranging from 17.42 kPa to 126.05 kPa with various push locations. The results show that our approach can estimate elastic properties on a pixelwise basis with a mean absolute error of 5.01+-4.37 kPa. Furthermore, we estimate local elasticity independent of the push location and can even perform accurate estimates inside the push region. For phantoms with embedded inclusions, we report a 53.93% lower MAE (7.50 kPa) and on the background of 85.24% (1.64 kPa) compared to a conventional shear wave method. Overall, our method offers fast local estimations of elastic properties with small spatio-temporal window sizes.
    Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks. (arXiv:2204.10496v2 [cs.CV] UPDATED)
    Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets. While these datasets reach an order of 10 million samples, the labor cost is prohibitive to scale further. Conversely, unimodal encoders are pretrained with simpler annotations that are less cost-prohibitive, achieving scales of hundreds of millions to billions. As a result, unimodal encoders have achieved state-of-art (SOTA) on many downstream tasks. However, challenges remain when applying to VL tasks. The pretraining data is not optimal for cross-modal architectures and requires heavy computational resources. In addition, unimodal architectures lack cross-modal interactions that have demonstrated significant benefits for VL tasks. Therefore, how to best leverage pretrained unimodal encoders for VL tasks is still an area of active research. In this work, we propose a method to leverage unimodal vision and text encoders for VL tasks that augment existing VL approaches while conserving computational complexity. Specifically, we propose Multimodal Adaptive Distillation (MAD), which adaptively distills useful knowledge from pretrained encoders to cross-modal VL encoders. Second, to better capture nuanced impacts on VL task performance, we introduce an evaluation protocol that includes Visual Commonsense Reasoning (VCR), Visual Entailment (SNLI-VE), and Visual Question Answering (VQA), across a variety of data constraints and conditions of domain shift. Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data. Finally, MAD outperforms concurrent works utilizing pretrained vision encoder from CLIP. Code will be made available.
    Cooperative Multi-Agent Reinforcement Learning with Hypergraph Convolution. (arXiv:2112.06771v2 [cs.AI] UPDATED)
    Recent years have witnessed the great success of multi-agent systems (MAS). Value decomposition, which decomposes joint action values into individual action values, has been an important work in MAS. However, many value decomposition methods ignore the coordination among different agents, leading to the notorious "lazy agents" problem. To enhance the coordination in MAS, this paper proposes HyperGraph CoNvolution MIX (HGCN-MIX), a method that incorporates hypergraph convolution with value decomposition. HGCN-MIX models agents as well as their relationships as a hypergraph, where agents are nodes and hyperedges among nodes indicate that the corresponding agents can coordinate to achieve larger rewards. Then, it trains a hypergraph that can capture the collaborative relationships among agents. Leveraging the learned hypergraph to consider how other agents' observations and actions affect their decisions, the agents in a MAS can better coordinate. We evaluate HGCN-MIX in the StarCraft II multi-agent challenge benchmark. The experimental results demonstrate that HGCN-MIX can train joint policies that outperform or achieve a similar level of performance as the current state-of-the-art techniques. We also observe that HGCN-MIX has an even more significant improvement of performance in the scenarios with a large amount of agents. Besides, we conduct additional analysis to emphasize that when the hypergraph learns more relationships, HGCN-MIX can train stronger joint policies.
    An Explainable Regression Framework for Predicting Remaining Useful Life of Machines. (arXiv:2204.13574v1 [cs.LG])
    Prediction of a machine's Remaining Useful Life (RUL) is one of the key tasks in predictive maintenance. The task is treated as a regression problem where Machine Learning (ML) algorithms are used to predict the RUL of machine components. These ML algorithms are generally used as a black box with a total focus on the performance without identifying the potential causes behind the algorithms' decisions and their working mechanism. We believe, the performance (in terms of Mean Squared Error (MSE), etc.,) alone is not enough to build the trust of the stakeholders in ML prediction rather more insights on the causes behind the predictions are needed. To this aim, in this paper, we explore the potential of Explainable AI (XAI) techniques by proposing an explainable regression framework for the prediction of machines' RUL. We also evaluate several ML algorithms including classical and Neural Networks (NNs) based solutions for the task. For the explanations, we rely on two model agnostic XAI methods namely Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP). We believe, this work will provide a baseline for future research in the domain.
    Structured Pruning Learns Compact and Accurate Models. (arXiv:2204.00408v2 [cs.CL] UPDATED)
    The growing size of neural language models has led to increased attention in model compression. The two predominant approaches are pruning, which gradually removes weights from a pre-trained model, and distillation, which trains a smaller compact model to match a larger one. Pruning methods can significantly reduce the model size but hardly achieve large speedups as distillation. However, distillation methods require large amounts of unlabeled data and are expensive to train. In this work, we propose a task-specific structured pruning method CoFi (Coarse- and Fine-grained Pruning), which delivers highly parallelizable subnetworks and matches the distillation methods in both accuracy and latency, without resorting to any unlabeled data. Our key insight is to jointly prune coarse-grained (e.g., layers) and fine-grained (e.g., heads and hidden units) modules, which controls the pruning decision of each parameter with masks of different granularity. We also devise a layerwise distillation strategy to transfer knowledge from unpruned to pruned models during optimization. Our experiments on GLUE and SQuAD datasets show that CoFi yields models with over 10x speedups with a small accuracy drop, showing its effectiveness and efficiency compared to previous pruning and distillation approaches.
    Scan Specific Artifact Reduction in K-space (SPARK) Neural Networks Synergize with Physics-based Reconstruction to Accelerate MRI. (arXiv:2104.01188v3 [eess.SP] UPDATED)
    Purpose: To develop a scan-specific model that estimates and corrects k-space errors made when reconstructing accelerated Magnetic Resonance Imaging (MRI) data. Methods: Scan-Specific Artifact Reduction in k-space (SPARK) trains a convolutional-neural-network to estimate and correct k-space errors made by an input reconstruction technique by back-propagating from the mean-squared-error loss between an auto-calibration signal (ACS) and the input technique's reconstructed ACS. First, SPARK is applied to GRAPPA and demonstrates improved robustness over other scan-specific models, such as RAKI and residual-RAKI. Subsequent experiments demonstrate that SPARK synergizes with residual-RAKI to improve reconstruction performance. SPARK also improves reconstruction quality when applied to advanced acquisition and reconstruction techniques like 2D virtual coil (VC-) GRAPPA, 2D LORAKS, 3D GRAPPA without an integrated ACS region, and 2D/3D wave-encoded images. Results: SPARK yields 1.5x - 2x RMSE reduction when applied to GRAPPA and improves robustness to ACS size for various acceleration rates in comparison to other scan-specific techniques. When applied to advanced reconstruction techniques such as residual-RAKI, 2D VC-GRAPPA and LORAKS, SPARK achieves up to 20% RMSE improvement. SPARK with 3D GRAPPA also improves performance by ~2x and perceived image quality without a fully sampled ACS region. Finally, SPARK synergizes with non-cartesian 2D and 3D wave-encoding imaging by reducing RMSE between 20-25% and providing qualitative improvements. Conclusion: SPARK synergizes with physics-based acquisition and reconstruction techniques to improve accelerated MRI by training scan-specific models to estimate and correct reconstruction errors in k-space.
    Standardized Evaluation of Machine Learning Methods for Evolving Data Streams. (arXiv:2204.13625v1 [cs.LG])
    Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work therefore often draws on different heuristics and simulations that do not necessarily produce meaningful and reliable results. Indeed, in the absence of common evaluation standards, it often remains unclear how online learning methods will perform in practice or in comparison to similar work. In this paper, we propose a comprehensive set of properties for high-quality machine learning in evolving data streams. In particular, we discuss sensible performance measures and evaluation strategies for online predictive modelling, online feature selection and concept drift detection. As one of the first works, we also look at the interpretability of online learning methods. The proposed evaluation standards are provided in a new Python framework called float. Float is completely modular and allows the simultaneous integration of common libraries, such as scikit-multiflow or river, with custom code. Float is open-sourced and can be accessed at https://github.com/haugjo/float. In this sense, we hope that our work will contribute to more standardized, reliable and realistic testing and comparison of online machine learning methods.
    Representative period selection for power system planning using autoencoder-based dimensionality reduction. (arXiv:2204.13608v1 [cs.LG])
    Power sector capacity expansion models (CEMs) that are used for studying future low-carbon grid scenarios must incorporate detailed representation of grid operations. Often CEMs are formulated to model grid operations over representative periods that are sampled from the original input data using clustering algorithms. However, such representative period selection (RPS) methods are limited by the declining efficacy of the clustering algorithm with increasing dimensionality of the input data and do not consider the relative importance of input data variations on CEM outcomes. Here, we propose a RPS method that addresses these limitations by incorporating dimensionality reduction, accomplished via neural network based autoencoders, prior to clustering. Such dimensionality reduction not only improves the performance of the clustering algorithm, but also facilitates using additional features, such as estimated outputs produced from parallel solutions of simplified versions of the CEM for each disjoint period in the input data (e.g. 1 week). The impact of incorporating dimensionality reduction as part of RPS methods is quantified through the error in outcomes of the corresponding reduced-space CEM vs. the full space CEM. Extensive numerical experimentation across various networks and range of technology and policy scenarios establish the superiority of the dimensionality-reduction based RPS methods.
    Toward Compositional Generalization in Object-Oriented World Modeling. (arXiv:2204.13661v1 [cs.LG])
    Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact} or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves approximate but more efficient compositional generalization.
    DOTIN: Dropping Task-Irrelevant Nodes for GNNs. (arXiv:2204.13429v1 [cs.LG])
    Scalability is an important consideration for deep graph neural networks. Inspired by the conventional pooling layers in CNNs, many recent graph learning approaches have introduced the pooling strategy to reduce the size of graphs for learning, such that the scalability and efficiency can be improved. However, these pooling-based methods are mainly tailored to a single graph-level task and pay more attention to local information, limiting their performance in multi-task settings which often require task-specific global information. In this paper, departure from these pooling-based efforts, we design a new approach called DOTIN (\underline{D}r\underline{o}pping \underline{T}ask-\underline{I}rrelevant \underline{N}odes) to reduce the size of graphs. Specifically, by introducing $K$ learnable virtual nodes to represent the graph embeddings targeted to $K$ different graph-level tasks, respectively, up to 90\% raw nodes with low attentiveness with an attention model -- a transformer in this paper, can be adaptively dropped without notable performance decreasing. Achieving almost the same accuracy, our method speeds up GAT by about 50\% on graph-level tasks including graph classification and graph edit distance (GED) with about 60\% less memory, on D\&D dataset. Code will be made publicly available in https://github.com/Sherrylone/DOTIN.
    Prescriptive and Descriptive Approaches to Machine-Learning Transparency. (arXiv:2204.13582v1 [cs.SE])
    Specialized documentation techniques have been developed to communicate key facts about machine-learning (ML) systems and the datasets and models they rely on. Techniques such as Datasheets, FactSheets, and Model Cards have taken a mainly descriptive approach, providing various details about the system components. While the above information is essential for product developers and external experts to assess whether the ML system meets their requirements, other stakeholders might find it less actionable. In particular, ML engineers need guidance on how to mitigate potential shortcomings in order to fix bugs or improve the system's performance. We survey approaches that aim to provide such guidance in a prescriptive way. We further propose a preliminary approach, called Method Cards, which aims to increase the transparency and reproducibility of ML systems by providing prescriptive documentation of commonly-used ML methods and techniques. We showcase our proposal with an example in small object detection, and demonstrate how Method Cards can communicate key considerations for model developers. We further highlight avenues for improving the user experience of ML engineers based on Method Cards.
    Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features. (arXiv:2204.13399v1 [cs.LG])
    Federated learning (FL) provides a privacy-preserving solution for distributed machine learning tasks. One challenging problem that severely damages the performance of FL models is the co-occurrence of data heterogeneity and long-tail distribution, which frequently appears in real FL applications. In this paper, we reveal an intriguing fact that the biased classifier is the primary factor leading to the poor performance of the global model. Motivated by the above finding, we propose a novel and privacy-preserving FL method for heterogeneous and long-tailed data via Classifier Re-training with Federated Features (CReFF). The classifier re-trained on federated features can produce comparable performance as the one re-trained on real data in a privacy-preserving manner without information leakage of local data or class distribution. Experiments on several benchmark datasets show that the proposed CReFF is an effective solution to obtain a promising FL model under heterogeneous and long-tailed data. Comparative results with the state-of-the-art FL methods also validate the superiority of CReFF. Our code is available at https://github.com/shangxinyi/CReFF-FL.
    Worst-Case Dynamic Power Distribution Network Noise Prediction Using Convolutional Neural Network. (arXiv:2204.13109v1 [cs.LG])
    Worst-case dynamic PDN noise analysis is an essential step in PDN sign-off to ensure the performance and reliability of chips. However, with the growing PDN size and increasing scenarios to be validated, it becomes very time- and resource-consuming to conduct full-stack PDN simulation to check the worst-case noise for different test vectors. Recently, various works have proposed machine learning based methods for supply noise prediction, many of which still suffer from large training overhead, inefficiency, or non-scalability. Thus, this paper proposed an efficient and scalable framework for the worst-case dynamic PDN noise prediction. The framework first reduces the spatial and temporal redundancy in the PDN and input current vector, and then employs efficient feature extraction as well as a novel convolutional neural network architecture to predict the worst-case dynamic PDN noise. Experimental results show that the proposed framework consistently outperforms the commercial tool and the state-of-the-art machine learning method with only 0.63-1.02% mean relative error and 25-69$\times$ speedup.
    Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework. (arXiv:2204.13207v1 [cs.CV])
    Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks. In this paper, we present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes. We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint. The loss function is data driven and automatically adapts to arbitrary multi-label structures. Experiments on several datasets show that our relationship-preserving embedding performs well on a variety of tasks and outperform the baseline supervised and self-supervised approaches. Code is available at https://github.com/salesforce/hierarchicalContrastiveLearning.
    Neural network controllers for uncertain linear systems. (arXiv:2204.13209v1 [eess.SY])
    We consider the design of reliable neural network (NN)-based approximations of traditional stabilizing controllers for linear systems affected by polytopic uncertainty, including controllers with variable structure and those based on a minimal selection policy. We develop a systematic procedure to certify the closed-loop stability and performance of a polytopic system when a rectified linear unit (ReLU)-based approximation replaces such traditional controllers. We provide sufficient conditions to ensure stability involving the worst-case approximation error and the Lipschitz constant characterizing the error function between ReLU-based and traditional controller-based state-to-input mappings, and further provide offline, mixed-integer optimization-based methods that allow us to compute those quantities exactly.
    Epileptic Seizure Classification Using Combined Labels and a Genetic Algorithm. (arXiv:2110.01742v4 [eess.SP] UPDATED)
    Epilepsy affects 50 million people worldwide and is one of the most common serious neurological disorders. Seizure detection and classification is a valuable tool for diagnosing and maintaining the condition. An automated classification algorithm will allow for accurate diagnosis. Utilising the Temple University Hospital (TUH) Seizure Corpus, six seizure types are compared; absence, complex partial, myoclonic, simple partial, tonic and tonic- clonic models. This study proposes a method that utilises unique features with a novel parallel classifier - Parallel Genetic Naive Bayes (NB) Seizure Classifier (PGNBSC). The PGNBSC algorithm searches through the features and by reclassifying the data each time, the algorithm will create a matrix for optimum search criteria. Ictal states from the EEGs are segmented into 1.8 s windows, where the epochs are then further decomposed into 13 different features from the first intrinsic mode function (IMF). The features are compared using an original NB classifier in the first model. This is improved upon in a second model by using a genetic algorithm (Binary Grey Wolf Optimisation, Option 1) with a NB classifier. The third model uses a combination of the simple partial and complex partial seizures to provide the highest classification accuracy for each of the six seizures amongst the three models (20%, 53%, and 85% for first, second, and third model, respectively).
    A Decision Model for Federated Learning Architecture Pattern Selection. (arXiv:2204.13291v1 [cs.LG])
    Federated learning is growing fast in both academia and industry to resolve data hungriness and privacy issues in machine learning. A federated learning system being widely distributed with different components and stakeholders requires software system design thinking. For instance, multiple patterns and tactics have been summarised by researchers that cover various aspects, from client management, training configuration, model deployment, etc. However, the multitude of patterns leaves the designers confused about when and which pattern to adopt or adapt. Therefore, in this paper, we present a set of decision models to assist designers and architects who have limited knowledge in federated learning, in selecting architectural patterns for federated learning architecture design. Each decision model maps functional and non-functional requirements of federated learning systems to a set of patterns. we also clarify the trade-offs that may be implicit in the patterns. We evaluated the decision model through a set of interviews with practitioners to assess the correctness and usefulness in guiding the architecture design process through various design decision options.
    Efficient-VDVAE: Less is more. (arXiv:2203.13751v2 [cs.LG] UPDATED)
    Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .
    Attention Mechanism in Neural Networks: Where it Comes and Where it Goes. (arXiv:2204.13154v1 [cs.LG])
    A long time ago in the machine learning literature, the idea of incorporating a mechanism inspired by the human visual system into neural networks was introduced. This idea is named the attention mechanism, and it has gone through a long development period. Today, many works have been devoted to this idea in a variety of tasks. Remarkable performance has recently been demonstrated. The goal of this paper is to provide an overview from the early work on searching for ways to implement attention idea with neural networks until the recent trends. This review emphasizes the important milestones during this progress regarding different tasks. By this way, this study aims to provide a road map for researchers to explore the current development and get inspired for novel approaches beyond the attention.
    Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center. (arXiv:2201.11727v3 [cs.DC] UPDATED)
    This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.
    DIANES: A DEI Audit Toolkit for News Sources. (arXiv:2203.11383v2 [cs.IR] UPDATED)
    Professional news media organizations have always touted the importance that they give to multiple perspectives. However, in practice the traditional approach to all-sides has favored people in the dominant culture. Hence it has come under ethical critique under the new norms of diversity, equity, and inclusion (DEI). When DEI is applied to journalism, it goes beyond conventional notions of impartiality and bias and instead democratizes the journalistic practice of sourcing -- who is quoted or interviewed, who is not, how often, from which demographic group, gender, and so forth. There is currently no real-time or on-demand tool in the hands of reporters to analyze the persons they quote. In this paper, we present DIANES, a DEI Audit Toolkit for News Sources. It consists of a natural language processing pipeline on the backend to extract quotes, speakers, titles, and organizations from news articles in real time. On the frontend, DIANES offers the WordPress plugins, a Web monitor, and a DEI annotation API service, to help news media monitor their own quoting patterns and push themselves towards DEI norms.
    Pedagogical Rule Extraction to Learn Interpretable Models - an Empirical Study. (arXiv:2112.13285v2 [cs.LG] UPDATED)
    Machine-learning models are ubiquitous. In some domains, for instance, in medicine, the models' predictions must be interpretable. Decision trees, classification rules, and subgroup discovery are three broad categories of supervised machine-learning models presenting knowledge in the form of interpretable rules. The accuracy of these models learned from small datasets is usually low. Obtaining larger datasets is often hard to impossible. Pedagogical rule extraction methods could help to learn better rules from small data by augmenting a dataset employing statistical models and using it to learn a rule-based model. However, existing evaluation of these methods is often inconclusive, and they were not compared so far. Our framework PRELIM unifies existing pedagogical rule extraction techniques. In the extensive experiments, we identified promising PRELIM configurations not studied before.
    Continual learning-based probabilistic slow feature analysis for multimode dynamic process monitoring. (arXiv:2202.11295v2 [cs.LG] UPDATED)
    In this paper, a novel multimode dynamic process monitoring approach is proposed by extending elastic weight consolidation (EWC) to probabilistic slow feature analysis (PSFA) in order to extract multimode slow features for online monitoring. EWC was originally introduced in the setting of machine learning of sequential multi-tasks with the aim of avoiding catastrophic forgetting issue, which equally poses as a major challenge in multimode dynamic process monitoring. When a new mode arrives, a set of data should be collected so that this mode can be identified by PSFA and prior knowledge. Then, a regularization term is introduced to prevent new data from significantly interfering with the learned knowledge, where the parameter importance measures are estimated. The proposed method is denoted as PSFA-EWC, which is updated continually and capable of achieving excellent performance for successive modes. Different from traditional multimode monitoring algorithms, PSFA-EWC furnishes backward and forward transfer ability. The significant features of previous modes are retained while consolidating new information, which may contribute to learning new relevant modes. Compared with several known methods, the effectiveness of the proposed method is demonstrated via a continuous stirred tank heater and a practical coal pulverizing system.
    Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products. (arXiv:2202.05120v2 [cs.DS] UPDATED)
    We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $\epsilon$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+\epsilon)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where $\|M\|_{S_p}$ denotes the $\ell_p$ norm of the the singular values of $M$. For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrt{\epsilon})$ matrix-vector products, improving on the na\"ive $\tilde{O}(k/\epsilon)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/\epsilon))$ factors. Our main result is an algorithm that uses only $\tilde{O}(kp^{1/6}/\epsilon^{1/3})$ matrix-vector products, and works for all $p \geq 1$. For $p = 2$ our bound improves the previous $\tilde{O}(k/\epsilon^{1/2})$ bound to $\tilde{O}(k/\epsilon^{1/3})$. Since the Schatten-$p$ and Schatten-$\infty$ norms are the same up to a $1+ \epsilon$ factor when $p \geq (\log d)/\epsilon$, our bound recovers the result of Musco and Musco for $p = \infty$. Further, we prove a matrix-vector query lower bound of $\Omega(1/\epsilon^{1/3})$ for any fixed constant $p \geq 1$, showing that surprisingly $\tilde{\Theta}(1/\epsilon^{1/3})$ is the optimal complexity for constant~$k$. To obtain our results, we introduce several new techniques, including optimizing over multiple Krylov subspaces simultaneously, and pinching inequalities for partitioned operators. Our lower bound for $p \in [1,2]$ uses the Araki-Lieb-Thirring trace inequality, whereas for $p>2$, we appeal to a norm-compression inequality for aligned partitioned operators.
    Experimental quantum advantage with quantum coupon collector. (arXiv:2112.07884v2 [quant-ph] UPDATED)
    An increasing number of communication and computational schemes with quantum advantages have recently been proposed, which implies that quantum technology has fertile application prospects. However, demonstrating these schemes experimentally continues to be a central challenge because of the difficulty in preparing high-dimensional states or highly entangled states. In this study, we introduce and analyse a quantum coupon collector protocol by employing coherent states and simple linear optical elements, which was successfully demonstrated using realistic experimental equipment. We showed that our protocol can significantly reduce the number of samples needed to learn a specific set compared with the classical limit of the coupon collector problem. We also discuss the potential values and expansions of the quantum coupon collector by constructing a quantum blind box game. The information transmitted by the proposed game also broke the classical limit. These results strongly prove the advantages of quantum mechanics in machine learning and communication complexity.
    On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks. (arXiv:2203.13273v3 [cs.LG] UPDATED)
    Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g_t^2 and the squared prediction error (m_t-g_t)^2, respectively, where m_t is the first momentum at iteration t and can be viewed as a prediction of g_t. In this work, we investigate if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above research question in two steps. Firstly, we slightly modify Adam and AdaBelief by introducing layerwise adaptive stepsizes in their update procedures via either pre- or post-processing. Our empirical results indicate that the slight modification produces comparable performance for training VGG and ResNet models over CIFAR10 and CIFAR100, suggesting that layer-wise gradient statistics play an important role towards the success of Adam and AdaBelief for at least certian DNN tasks. In the second step, we propose Aida, a new optimisation method, with the objective that the elementwise stepsizes within each layer have significantly smaller statistical variances, and the layerwise average stepsizes are much more compact across all the layers. Motivated by the fact that (m_t-g_t)^2 in AdaBelief is conservative in comparison to g_t^2 in Adam in terms of layerwise statistical averages and variances, Aida is designed by tracking a more conservative function of m_t and g_t than (m_t-g_t)^2 via layerwise vector projections. Experimental results show that Aida produces either competitive or better performance with respect to a number of existing methods including Adam and AdaBelief for a set of challenging DNN tasks.
    Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes. (arXiv:2106.15380v3 [cs.LG] UPDATED)
    In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.
    Multiplicative Updates for NMF with $\beta$-Divergences under Disjoint Equality Constraints. (arXiv:2010.16223v2 [cs.LG] UPDATED)
    Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $\beta$-divergences ($\beta$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$\beta$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $\beta$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $\beta$-NMF with $\ell_2$-norm constraints on the columns of $W$.
    Efficient Geometry-aware 3D Generative Adversarial Networks. (arXiv:2112.07945v2 [cs.CV] UPDATED)
    Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.
    On Evaluation Metrics for Graph Generative Models. (arXiv:2201.09871v2 [cs.LG] UPDATED)
    In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many cases it fails to consider underlying edge and node features, and iii) it is prohibitively slow to perform. In this work, we mitigate these issues by searching for scalar, domain-agnostic, and scalable metrics for evaluating and ranking GGMs. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-specific network. Motivated by the power of certain Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN. We design experiments to thoroughly test metrics on their ability to measure the diversity and fidelity of generated graphs, as well as their sample and computational efficiency. Depending on the quantity of samples, we recommend one of two random-GNN-based metrics that we show to be more expressive than pre-existing metrics. While we focus on applying these metrics to GGM evaluation, in practice this enables the ability to easily compute the dissimilarity between any two sets of graphs regardless of domain. Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics.
    Trivial or impossible -- dichotomous data difficulty masks model differences (on ImageNet and beyond). (arXiv:2110.05922v3 [cs.CV] UPDATED)
    "The power of a generalization system follows directly from its biases" (Mitchell 1980). Today, CNNs are incredibly powerful generalisation systems -- but to what degree have we understood how their inductive bias influences model decisions? We here attempt to disentangle the various aspects that determine how a model decides. In particular, we ask: what makes one model decide differently from another? In a meticulously controlled setting, we find that (1.) irrespective of the network architecture or objective (e.g. self-supervised, semi-supervised, vision transformers, recurrent models) all models end up with a similar decision boundary. (2.) To understand these findings, we analysed model decisions on the ImageNet validation set from epoch to epoch and image by image. We find that the ImageNet validation set, among others, suffers from dichotomous data difficulty (DDD): For the range of investigated models and their accuracies, it is dominated by 46.0% "trivial" and 11.5% "impossible" images (beyond label errors). Only 42.5% of the images could possibly be responsible for the differences between two models' decision boundaries. (3.) Only removing the "impossible" and "trivial" images allows us to see pronounced differences between models. (4.) Humans are highly accurate at predicting which images are "trivial" and "impossible" for CNNs (81.4%). This implies that in future comparisons of brains, machines and behaviour, much may be gained from investigating the decisive role of images and the distribution of their difficulties.
    Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models. (arXiv:2108.12657v2 [cs.LG] UPDATED)
    Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v4 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithm works with any underlying model and (unknown) data-generating distribution and does not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision and tabular medical data.
    On Expressivity and Trainability of Quadratic Networks. (arXiv:2110.06081v2 [cs.LG] UPDATED)
    Inspired by the diversity of biological neurons, quadratic artificial neurons can play an important role in deep learning models. The type of quadratic neurons of our interest replaces the inner-product operation in the conventional neuron with a quadratic function. Despite promising results so far achieved by networks of quadratic neurons, there are important issues not well addressed. Theoretically, the superior expressivity of a quadratic network over either a conventional network or a conventional network via quadratic activation is not fully elucidated, which makes the use of quadratic networks not well grounded. Practically, although a quadratic network can be trained via generic backpropagation, it can be subject to a higher risk of collapse than the conventional counterpart. To address these issues, we first apply the spline theory and a measure from algebraic geometry to give two theorems that demonstrate better model expressivity of a quadratic network than the conventional counterpart with or without quadratic activation. Then, we propose an effective and efficient training strategy referred to as ReLinear to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks. Comprehensive experiments on popular datasets are performed to support our findings and evaluate the performance of quadratic deep learning.
    Artificial Text Detection via Examining the Topology of Attention Maps. (arXiv:2109.04825v2 [cs.CL] UPDATED)
    The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10\% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information.
    SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. (arXiv:2004.13316v2 [cs.CV] UPDATED)
    Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our released S$^2$TLD by this paper. The results show the effectiveness of our approach. The released dataset S2TLD is made public available, which contains 5,786 images with 14,130 traffic light instances across five categories.
    Policy Gradient Stock GAN for Realistic Discrete Order Data Generation in Financial Markets. (arXiv:2204.13338v1 [cs.LG])
    This study proposes a new generative adversarial network (GAN) for generating realistic orders in financial markets. In some previous works, GANs for financial markets generated fake orders in continuous spaces because of GAN architectures' learning limitations. However, in reality, the orders are discrete, such as order prices, which has minimum order price unit, or order types. Thus, we change the generation method to place the generated fake orders into discrete spaces in this study. Because this change disabled the ordinary GAN learning algorithm, this study employed a policy gradient, frequently used in reinforcement learning, for the learning algorithm. Through our experiments, we show that our proposed model outperforms previous models in generated order distribution. As an additional benefit of introducing the policy gradient, the entropy of the generated policy can be used to check GAN's learning status. In the future, higher performance GANs, better evaluation methods, or the applications of our GANs can be addressed.
    Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. (arXiv:2108.12471v2 [q-bio.QM] UPDATED)
    DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find small molecules that bind a protein target. Applying QSAR modeling to DEL data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been shown recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" to accommodate the sparse and noisy nature of DEL data. However, a binary classifier cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned structure-activity relationships (SAR). Our approach explicitly models the Poisson statistics of the sequencing process used in the DEL experimental workflow under a frequentist view. We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2. Due to the treatment of uncertainty in the data through the negative log-likelihood loss function, the models can ignore low-confidence outliers. While our approach does not demonstrate a benefit for extrapolation to novel structures, we expect our denoising and visualization pipeline to be useful in identifying SAR trends and enriched pharmacophores in DEL data. Further, this approach to uncertainty-aware regression is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.
    Predicting S&P500 Index direction with Transfer Learning and a Causal Graph as main Input. (arXiv:2011.13113v3 [q-fin.ST] UPDATED)
    We propose a unified multi-tasking framework to represent the complex and uncertain causal process of financial market dynamics, and then to predict the movement of any type of index with an application on the monthly direction of the S&P500 index. our solution is based on three main pillars: (i) the use of transfer learning to share knowledge and feature (representation, learning) between all financial markets, increase the size of the training sample and preserve the stability between training, validation and test sample. (ii) The combination of multidisciplinary knowledge (Financial economics, behavioral finance, market microstructure and portfolio construction theories) to represent a global top-down dynamics of any financial market, through a graph. (iii) The integration of forward looking unstructured data, different types of contexts (long, medium and short term) through latent variables/nodes and then, use a unique VAE network (parameter sharing) to learn simultaneously their distributional representation. We obtain Accuracy, F1-score, and Matthew Correlation of 74.3 %, 67 % and 0.42 above the industry and other benchmark on 12 years test period which include three unstable and difficult sub-period to predict.
    Bilinear value networks. (arXiv:2204.13695v1 [cs.AI])
    The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve the generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, {\phi}(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.
    AlphaZero-Inspired General Board Game Learning and Playing. (arXiv:2204.13307v1 [cs.LG])
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they are very complex and require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS for the first time around RL n-tuple networks to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this AlphaZero-inspired agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other algorithms could only defeat Edax up to level 2).
    Exchangeability-Aware Sum-Product Networks. (arXiv:2110.05165v2 [cs.LG] UPDATED)
    Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.
    Supervised machine learning classification for short straddles on the S&P500. (arXiv:2204.13587v1 [q-fin.CP])
    In this working paper we present our current progress in the training of machine learning models to execute short option strategies on the S&P500. As a first step, this paper is breaking this problem down to a supervised classification task to decide if a short straddle on the S&P500 should be executed or not on a daily basis. We describe our used framework and present an overview over our evaluation metrics on different classification models. In this preliminary work, using standard machine learning techniques and without hyperparameter search, we find no statistically significant outperformance to a simple "trade always" strategy, but gain additional insights on how we could proceed in further experiments.
    Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension. (arXiv:1905.10395v5 [cs.LG] UPDATED)
    We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines. This work is a gentle extension of [2].
    Overcoming Catastrophic Forgetting via Direction-Constrained Optimization. (arXiv:2011.12581v2 [cs.LG] UPDATED)
    This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
    Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment. (arXiv:2112.13795v2 [cs.CL] UPDATED)
    Recent works have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models have grown to dominate much of NLP but little is known empirically on how to best apply them for mental health assessment. Using degree of depression as a case study, we do an empirical analysis on which off-the-shelf language model, individual layers, and combinations of layers seem most promising when applied to human-level NLP tasks. Notably, we find RoBERTa most effective and, despite the standard in past work suggesting the second-to-last or concatenation of the last 4 layers, we find layer 19 (sixth-to last) is at least as good as layer 23 when using 1 layer. Further, when using multiple layers, distributing them across the second half (i.e. Layers 12+), rather than last 4, of the 24 layers yielded the most accurate results.
    An Adversarial Attack Analysis on Malicious Advertisement URL Detection Framework. (arXiv:2204.13172v1 [cs.LG])
    Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks, and the need to address this issue is growing in both industry and academia. Generally, the attacker delivers an attack vector to the user by means of an email, an advertisement link or any other means of communication and directs them to a malicious website to steal sensitive information and to defraud them. Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data. In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection. The combination set of six different kinds of features precisely overcome the obfuscation in fraudulent URL classification. Based on different statistical properties, we use twelve different formatted datasets for detection, prediction and classification task. We extend our prediction analysis for mismatched and unlabelled datasets. For this framework, we analyze the performance of four machine learning techniques: Random Forest, Gradient Boost, XGBoost and AdaBoost in the detection part. With our proposed method, we can achieve a false negative rate as low as 0.0037 while maintaining high accuracy of 99.63%. Moreover, we devise a novel unsupervised technique for data clustering using K- Means algorithm for the visual analysis. This paper analyses the vulnerability of decision tree-based models using the limited knowledge attack scenario. We considered the exploratory attack and implemented Zeroth Order Optimization adversarial attack on the detection models.
    Medical Image Segmentation with 3D Convolutional Neural Networks: A Survey. (arXiv:2108.08467v3 [eess.IV] UPDATED)
    Computer-aided medical image analysis plays a significant role in assisting medical practitioners for expert clinical diagnosis and deciding the optimal treatment plan. At present, convolutional neural networks (CNN) are the preferred choice for medical image analysis. In addition, with the rapid advancements in three-dimensional (3D) imaging systems and the availability of excellent hardware and software support to process large volumes of data, 3D deep learning methods are gaining popularity in medical image analysis. Here, we present an extensive review of the recently evolved 3D deep learning methods in medical image segmentation. Furthermore, the research gaps and future directions in 3D medical image segmentation are discussed.
    Improving the Robustness of Federated Learning for Severely Imbalanced Datasets. (arXiv:2204.13414v1 [cs.LG])
    With the ever increasing data deluge and the success of deep neural networks, the research of distributed deep learning has become pronounced. Two common approaches to achieve this distributed learning is synchronous and asynchronous weight update. In this manuscript, we have explored very simplistic synchronous weight update mechanisms. It has been seen that with an increasing number of worker nodes, the performance degrades drastically. This effect has been studied in the context of extreme imbalanced classification (e.g. outlier detection). In practical cases, the assumed conditions of i.i.d. may not be fulfilled. There may also arise global class imbalance situations like that of outlier detection where the local servers receive severely imbalanced data and may not get any samples from the minority class. In that case, the DNNs in the local servers will get completely biased towards the majority class that they receive. This would highly impact the learning at the parameter server (which practically does not see any data). It has been observed that in a parallel setting if one uses the existing federated weight update mechanisms at the parameter server, the performance degrades drastically with the increasing number of worker nodes. This is mainly because, with the increasing number of nodes, there is a high chance that one worker node gets a very small portion of the data, either not enough to train the model without overfitting or having a highly imbalanced class distribution. The chapter, hence, proposes a workaround to this problem by introducing the concept of adaptive cost-sensitive momentum averaging. It is seen that for the proposed system, there was no to minimal degradation in performance while most of the other methods hit their bottom performance before that.
    Policy Gradient Approach to Compilation of Variational Quantum Circuits. (arXiv:2111.10227v2 [quant-ph] UPDATED)
    We propose a method for finding approximate compilations of quantum unitary transformations, based on techniques from policy gradient reinforcement learning. The choice of a stochastic policy allows us to rephrase the optimization problem in terms of probability distributions, rather than variational gates. In this framework, finding the optimal configuration is done by optimizing over distribution parameters, rather than over free angles. We show numerically that this approach can be more competitive than gradient-free methods, for comparable amounts of resources (i.e. quantum circuit runs). Another interesting feature of this approach to variational compilation is that it does not need a separate register and long-range interactions to estimate the end-point fidelity, which is an improvement over methods which rely on the Hilbert-Schmidt test. We expect these techniques to be relevant for training variational circuits in other contexts.
    Vitruvion: A Generative Model of Parametric CAD Sketches. (arXiv:2109.14124v2 [cs.LG] UPDATED)
    Parametric computer-aided design (CAD) tools are the predominant way that engineers specify physical structures, from bicycle pedals to airplanes to printed circuit boards. The key characteristic of parametric CAD is that design intent is encoded not only via geometric primitives, but also by parameterized constraints between the elements. This relational specification can be viewed as the construction of a constraint program, allowing edits to coherently propagate to other parts of the design. Machine learning offers the intriguing possibility of accelerating the design process via generative modeling of these structures, enabling new tools such as autocompletion, constraint inference, and conditional synthesis. In this work, we present such an approach to generative modeling of parametric CAD sketches, which constitute the basic computational building blocks of modern mechanical design. Our model, trained on real-world designs from the SketchGraphs dataset, autoregressively synthesizes sketches as sequences of primitives, with initial coordinates, and constraints that reference back to the sampled primitives. As samples from the model match the constraint graph representation used in standard CAD software, they may be directly imported, solved, and edited according to downstream design tasks. In addition, we condition the model on various contexts, including partial sketches (primers) and images of hand-drawn sketches. Evaluation of the proposed approach demonstrates its ability to synthesize realistic CAD sketches and its potential to aid the mechanical design workflow.
    Optimal Transport for Unsupervised Denoising Learning. (arXiv:2108.02574v4 [eess.IV] UPDATED)
    Recently, much progress has been made in unsupervised denoising learning. However, existing methods more or less rely on some assumptions on the signal and/or degradation model, which limits their practical performance. How to construct an optimal criterion for unsupervised denoising learning without any prior knowledge on the degradation model is still an open question. Toward answering this question, this work proposes a criterion for unsupervised denoising learning based on the optimal transport theory. This criterion has favorable properties, e.g., approximately maximal preservation of the information of the signal, whilst achieving perceptual reconstruction. Furthermore, though a relaxed unconstrained formulation is used in practical implementation, we prove that the relaxed formulation in theory has the same solution as the original constrained formulation. Experiments on synthetic and real-world data, including realistic photographic, microscopy, depth, and raw depth images, demonstrate that the proposed method even compares favorably with supervised methods, e.g., approaching the PSNR of supervised methods while having better perceptual quality. Particularly, for spatially correlated noise and realistic microscopy images, the proposed method not only achieves better perceptual quality but also has higher PSNR than supervised methods. Besides, it shows remarkable superiority in harsh practical conditions with complex noise, e.g., raw depth images. Code is available at https://github.com/wangweiSJTU/OTUR.
    Deepfake Forensics via An Adversarial Game. (arXiv:2103.13567v2 [cs.CV] UPDATED)
    With the progress in AI-based facial forgery (i.e., deepfake), people are increasingly concerned about its abuse. Albeit effort has been made for training classification (also known as deepfake detection) models to recognize such forgeries, existing models suffer from poor generalization to unseen forgery technologies and high sensitivity to changes in image/video quality. In this paper, we advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities. We believe training with samples that are adversarially crafted to attack the classification models improves the generalization ability considerably. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted by models yet difficult to generalize, we further propose a new adversarial training method that attempts to blur out these specific artifacts, by introducing pixel-wise Gaussian blurring models. With adversarial training, the classification models are forced to learn more discriminative and generalizable features, and the effectiveness of our method can be verified by plenty of empirical evidence. Our code will be made publicly available.
    Multi Type Mean Field Reinforcement Learning. (arXiv:2002.02513v6 [cs.MA] UPDATED)
    Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field reinforcement learning, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field environments: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.
    KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients. (arXiv:2204.13683v1 [cs.RO])
    Simulators offer the possibility of safe, low-cost development of self-driving systems. However, current driving simulators exhibit na\"ive behavior models for background traffic. Hand-tuned scenarios are typically added during simulation to induce safety-critical situations. An alternative approach is to adversarially perturb the background traffic trajectories. In this paper, we study this approach to safety-critical driving scenario generation using the CARLA simulator. We use a kinematic bicycle model as a proxy to the simulator's true dynamics and observe that gradients through this proxy model are sufficient for optimizing the background traffic trajectories. Based on this finding, we propose KING, which generates safety-critical driving scenarios with a 20% higher success rate than black-box optimization. By solving the scenarios generated by KING using a privileged rule-based expert algorithm, we obtain training data for an imitation learning policy. After fine-tuning on this new data, we show that the policy becomes better at avoiding collisions. Importantly, our generated data leads to reduced collisions on both held-out scenarios generated via KING as well as traditional hand-crafted scenarios, demonstrating improved robustness.
    Fuzzy Cognitive Maps and Hidden Markov Models: Comparative Analysis of Efficiency within the Confines of the Time Series Classification Task. (arXiv:2204.13455v1 [cs.LG])
    Time series classification is one of the very popular machine learning tasks. In this paper, we explore the application of Hidden Markov Model (HMM) for time series classification. We distinguish between two modes of HMM application. The first, in which a single model is built for each class. The second, in which one HMM is built for each time series. We then transfer both approaches for classifier construction to the domain of Fuzzy Cognitive Maps. The identified four models, HMM NN (HMM, one per series), HMM 1C (HMM, one per class), FCM NN, and FCM 1C are then studied in a series of experiments. We compare the performance of different models and investigate the impact of their hyperparameters on the time series classification accuracy. The empirical evaluation shows a clear advantage of the one-model-per-series approach. The results show that the choice between HMM and FCM should be dataset-dependent.
    Autoencoder based Hybrid Multi-Task Predictor Network for Daily Open-High-Low-Close Prices Prediction of Indian Stocks. (arXiv:2204.13422v1 [cs.LG])
    Stock prices are highly volatile and sudden changes in trends are often very problematic for traditional forecasting models to handle. The standard Long Short Term Memory (LSTM) networks are regarded as the state-of-the-art models for such predictions. But, these models fail to handle sudden and drastic changes in the price trend. Moreover, there are some inherent constraints with the open, high, low and close (OHLC) prices of the stocks. Literature lacks the study on the inherent property of OHLC prices. We argue that predicting the OHLC prices for the next day is much more informative than predicting the trends of the stocks as the trend is mostly calculated using these OHLC prices only. The problem mainly is focused on Buy-Today Sell-Tomorrow (BTST) trading. In this regard, AEs when pre-trained with the stock prices, may be beneficial. A novel framework is proposed where a pre-trained encoder is cascaded in front of the multi-task predictor network. This hybrid network can leverage the power of a combination of networks and can both handle the OHLC constraints as well as capture any sudden drastic changes in the prices. It is seen that such a network is much more efficient at predicting stock prices. The experiments have been extended to recommend the most profitable and most overbought stocks on the next day. The model has been tested for multiple Indian companies and it is found that the recommendations from the proposed model have not resulted in a single loss for a test period of 300 days.
    Cumulative Stay-time Representation for Electronic Health Records in Medical Event Time Prediction. (arXiv:2204.13451v1 [cs.LG])
    We address the problem of predicting when a disease will develop, i.e., medical event time (MET), from a patient's electronic health record (EHR). The MET of non-communicable diseases like diabetes is highly correlated to cumulative health conditions, more specifically, how much time the patient spent with specific health conditions in the past. The common time-series representation is indirect in extracting such information from EHR because it focuses on detailed dependencies between values in successive observations, not cumulative information. We propose a novel data representation for EHR called cumulative stay-time representation (CTR), which directly models such cumulative health conditions. We derive a trainable construction of CTR based on neural networks that has the flexibility to fit the target data and scalability to handle high-dimensional EHR. Numerical experiments using synthetic and real-world datasets demonstrate that CTR alone achieves a high prediction performance, and it enhances the performance of existing models when combined with them.
    Reusability and Transferability of Macro Actions for Reinforcement Learning. (arXiv:1908.01478v3 [cs.NE] UPDATED)
    Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this paper, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action generated along with one RL method can be reused by another RL method for training, while the second one, transferability, means that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first generate macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the generated macro actions.
    A unified theory of information transfer and causal relation. (arXiv:2204.13598v1 [cond-mat.stat-mech])
    Information transfer between coupled stochastic dynamics, measured by transfer entropy and information flow, is suggested as a physical process underlying the causal relation of systems. While information transfer analysis has booming applications in both science and engineering fields, critical mysteries about its foundations remain unsolved. Fundamental yet difficult questions concern how information transfer and causal relation originate, what they depend on, how they differ from each other, and if they are created by a unified and general quantity. These questions essentially determine the validity of causal relation measurement via information transfer. Here we pursue to lay a complete theoretical basis of information transfer and causal relation. Beyond the well-known relations between these concepts that conditionally hold, we demonstrate that information transfer and causal relation universally originate from specific information synergy and redundancy phenomena characterized by high-order mutual information. More importantly, our theory analytically explains the mechanisms for information transfer and causal relation to originate, vanish, and differ from each other. Moreover, our theory naturally defines the effect sizes of information transfer and causal relation based on high-dimensional coupling events. These results may provide a unified view of information, synergy, and causal relation to bridge Pearl's causal inference theory in computer science and information transfer analysis in physics.
    Unaligned Supervision For Automatic Music Transcription in The Wild. (arXiv:2204.13668v1 [cs.SD])
    Multi-instrument Automatic Music Transcription (AMT), or the decoding of a musical recording into semantic musical content, is one of the holy grails of Music Information Retrieval. Current AMT approaches are restricted to piano and (some) guitar recordings, due to difficult data collection. In order to overcome data collection barriers, previous AMT approaches attempt to employ musical scores in the form of a digitized version of the same song or piece. The scores are typically aligned using audio features and strenuous human intervention to generate training labels. We introduce NoteEM, a method for simultaneously training a transcriber and aligning the scores to their corresponding performances, in a fully-automated process. Using this unaligned supervision scheme, complemented by pseudo-labels and pitch-shift augmentation, our method enables training on in-the-wild recordings with unprecedented accuracy and instrumental variety. Using only synthetic data and unaligned supervision, we report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations. We also demonstrate robustness and ease of use; we report comparable results when training on a small, easily obtainable, self-collected dataset, and we propose alternative labeling to the MusicNet dataset, which we show to be more accurate. Our project page is available at https://benadar293.github.io
    Process-BERT: A Framework for Representation Learning on Educational Process Data. (arXiv:2204.13607v1 [cs.LG])
    Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. However, analyzing process data is challenging since the specific format of process data varies a lot depending on different learning/testing scenarios. In this paper, we propose a framework for learning representations of educational process data that is applicable across many different learning scenarios. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation's report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative.
    Unlocking High-Accuracy Differentially Private Image Classification through Scale. (arXiv:2204.13650v1 [cs.LG])
    Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained 200-layer Normalizer-Free ResNet, we achieve a remarkable 77.1% top-1 accuracy on ImageNet under (1, 8*10^{-7})-DP, and achieve 81.1% under (8, 8*10^{-7})-DP. This markedly exceeds the previous SOTA of 47.9% under a larger privacy budget of (10, 10^{-6})-DP. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.
    Bona fide Riesz projections for density estimation. (arXiv:2204.13606v1 [eess.SP])
    The projection of sample measurements onto a reconstruction space represented by a basis on a regular grid is a powerful and simple approach to estimate a probability density function. In this paper, we focus on Riesz bases and propose a projection operator that, in contrast to previous works, guarantees the bona fide properties for the estimate, namely, non-negativity and total probability mass $1$. Our bona fide projection is defined as a convex problem. We propose solution techniques and evaluate them. Results suggest an improved performance, specifically in circumstances prone to rippling effects.
    Foundations for learning from noisy quantum experiments. (arXiv:2204.13691v1 [quant-ph])
    Understanding what can be learned from experiments is central to scientific progress. In this work, we use a learning-theoretic perspective to study the task of learning physical operations in a quantum machine when all operations (state preparation, dynamics, and measurement) are a priori unknown. We prove that, without any prior knowledge, if one can explore the full quantum state space by composing the operations, then every operation can be learned. When one cannot explore the full state space but all operations are approximately known and noise in Clifford gates is gate-independent, we find an efficient algorithm for learning all operations up to a single unlearnable parameter characterizing the fidelity of the initial state. For learning a noise channel on Clifford gates to a fixed accuracy, our algorithm uses quadratically fewer experiments than previously known protocols. Under more general conditions, the true description of the noise can be unlearnable; for example, we prove that no benchmarking protocol can learn gate-dependent Pauli noise on Clifford+T gates even under perfect state preparation and measurement. Despite not being able to learn the noise, we show that a noisy quantum computer that performs entangled measurements on multiple copies of an unknown state can yield a large advantage in learning properties of the state compared to a noiseless device that measures individual copies and then processes the measurement data using a classical computer. Concretely, we prove that noisy quantum computers with two-qubit gate error rate $\epsilon$ can achieve a learning task using $N$ copies of the state, while $N^{\Omega(1/\epsilon)}$ copies are required classically.
    Phase Shift Design in RIS Empowered Wireless Networks: From Optimization to AI-Based Methods. (arXiv:2204.13372v1 [cs.LG])
    Reconfigurable intelligent surfaces (RISs) have a revolutionary capability to customize the radio propagation environment for wireless networks. To fully exploit the advantages of RISs in wireless systems, the phases of the reflecting elements must be jointly designed with conventional communication resources, such as beamformers, transmit power, and computation time. However, due to the unique constraints on the phase shift, and massive numbers of reflecting units and users in large-scale networks, the resulting optimization problems are challenging to solve. This paper provides a review of current optimization methods and artificial intelligence-based methods for handling the constraints imposed by RIS and compares them in terms of solution quality and computational complexity. Future challenges in phase shift optimization involving RISs are also described and potential solutions are discussed.
    It's DONE: Direct ONE-shot learning without training optimization. (arXiv:2204.13361v1 [cs.LG])
    Learning a new concept from one example is a superior function of human brain and it is drawing attention in the field of machine learning as one-shot learning task. In this paper, we propose the simplest method for this task, named Direct ONE-shot learning (DONE). DONE adds a new class to a pretrained deep neural network (DNN) classifier with neither training optimization nor other-classes modification. DONE is inspired by Hebbian theory and directly uses the neural activity input of the final dense layer obtained from a data that belongs to the new additional class as the connectivity weight (synaptic strength) with a newly-provided-output neuron for the new class. DONE requires just one inference for obtaining the output of the final dense layer and its procedure is simple, deterministic, not requiring parameter tuning and hyperparameters. The performance of DONE depends entirely on the pretrained DNN model used as a backbone model, and we confirmed that DONE with a well-trained backbone model performs a practical-level accuracy. DONE has some advantages including a DNN's practical use that is difficult to spend high cost for a training, an evaluation of existing DNN models, and the understanding of the brain. DONE might be telling us one-shot learning is an easy task that can be achieved by a simple principle not only for humans but also for current well-trained DNN models.
    List-Mode PET Image Reconstruction Using Deep Image Prior. (arXiv:2204.13404v1 [physics.med-ph])
    List-mode positron emission tomography (PET) image reconstruction is an important tool for PET scanners with many lines-of-response (LORs) and additional information such as time-of-flight and depth-of-interaction. Deep learning is one possible solution to enhance the quality of PET image reconstruction. However, the application of deep learning techniques to list-mode PET image reconstruction have not been progressed because list data is a sequence of bit codes and unsuitable for processing by convolutional neural networks (CNN). In this study, we propose a novel list-mode PET image reconstruction method using an unsupervised CNN called deep image prior (DIP) and a framework of alternating direction method of multipliers. The proposed list-mode DIP reconstruction (LM-DIPRecon) method alternatively iterates regularized list-mode dynamic row action maximum likelihood algorithm (LM-DRAMA) and magnetic resonance imaging conditioned DIP (MR-DIP). We evaluated LM-DIPRecon using both simulation and clinical data, and it achieved sharper images and better tradeoff curves between contrast and noise than the LM-DRAMA and MR-DIP. These results indicated that the LM-DIPRecon is useful for quantitative PET imaging with limited events. In addition, as list data has finer temporal information than dynamic sinograms, list-mode deep image prior reconstruction is expected to be useful for 4D PET imaging and motion correction.
    Poisoning Deep Learning based Recommender Model in Federated Learning Scenarios. (arXiv:2204.13594v1 [cs.IR])
    Various attack methods against recommender systems have been proposed in the past years, and the security issues of recommender systems have drawn considerable attention. Traditional attacks attempt to make target items recommended to as many users as possible by poisoning the training data. Benifiting from the feature of protecting users' private data, federated recommendation can effectively defend such attacks. Therefore, quite a few works have devoted themselves to developing federated recommender systems. For proving current federated recommendation is still vulnerable, in this work we probe to design attack approaches targeting deep learning based recommender models in federated learning scenarios. Specifically, our attacks generate poisoned gradients for manipulated malicious users to upload based on two strategies (i.e., random approximation and hard user mining). Extensive experiments show that our well-designed attacks can effectively poison the target models, and the attack effectiveness sets the state-of-the-art.
    Model Selection, Adaptation, and Combination for Deep Transfer Learning through Neural Networks in Renewable Energies. (arXiv:2204.13293v1 [cs.LG])
    There is recent interest in using model hubs, a collection of pre-trained models, in computer vision tasks. To utilize the model hub, we first select a source model and then adapt the model for the target to compensate for differences. While there is yet limited research on a model selection and adaption for computer vision tasks, this holds even more for the field of renewable power. At the same time, it is a crucial challenge to provide forecasts for the increasing demand for power forecasts based on weather features from a numerical weather prediction. We close these gaps by conducting the first thorough experiment for model selection and adaptation for transfer learning in renewable power forecast, adopting recent results from the field of computer vision on six datasets. We adopt models based on data from different seasons and limit the amount of training data. As an extension of the current state of the art, we utilize a Bayesian linear regression for forecasting the response based on features extracted from a neural network. This approach outperforms the baseline with only seven days of training data. We further show how combining multiple models through ensembles can significantly improve the model selection and adaptation approach. In fact, with more than 30 days of training data, both proposed model combination techniques achieve similar results to those models trained with a full year of training data.
    On Parametric Optimal Execution and Machine Learning Surrogates. (arXiv:2204.08581v2 [q-fin.TR] UPDATED)
    We investigate optimal order execution problems in discrete time with instantaneous price impact and stochastic resilience. First, in the setting of linear transient price impact we derive a closed-form recursion for the optimal strategy, extending the deterministic results from Obizhaeva and Wang (J Financial Markets, 2013). Second, we develop a numerical algorithm based on dynamic programming and deep learning for the case of nonlinear transient price impact as proposed by Bouchaud et al. (Quant. Finance, 2004). Specifically, we utilize an actor-critic framework that constructs two neural-network (NN) surrogates for the value function and the feedback control. The flexible scalability of NN functional approximators enables parametric learning, i.e., incorporating several model or market parameters as part of the input space. Precise calibration of price impact, resilience, etc., is known to be extremely challenging and hence it is critical to understand sensitivity of the execution policy to these parameters. Our NN learner organically scales across multiple input dimensions and is shown to accurately approximate optimal strategies across a wide range of parameter configurations. We provide a fully reproducible Jupyter Notebook with our NN implementation, which is of independent pedagogical interest, demonstrating the ease of use of NN surrogates in (parametric) stochastic control problems.
    Predicting Sleeping Quality using Convolutional Neural Networks. (arXiv:2204.13584v1 [eess.SP])
    Identifying sleep stages and patterns is an essential part of diagnosing and treating sleep disorders. With the advancement of smart technologies, sensor data related to sleeping patterns can be captured easily. In this paper, we propose a Convolution Neural Network (CNN) architecture that improves the classification performance. In particular, we benchmark the classification performance from different methods, including traditional machine learning methods such as Logistic Regression (LR), Decision Trees (DT), k-Nearest Neighbour (k-NN), Naive Bayes (NB) and Support Vector Machine (SVM), on 3 publicly available sleep datasets. The accuracy, sensitivity, specificity, precision, recall, and F-score are reported and will serve as a baseline to simulate the research in this direction in the future.
    Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss. (arXiv:2204.13437v1 [cs.SD])
    Recent deep learning Text-to-Speech (TTS) systems have achieved impressive performance by generating speech close to human parity. However, they suffer from training stability issues as well as incorrect alignment of the intermediate acoustic representation with the input text sequence. In this work, we introduce Regotron, a regularized version of Tacotron2 which aims to alleviate the training issues and at the same time produce monotonic alignments. Our method augments the vanilla Tacotron2 objective function with an additional term, which penalizes non-monotonic alignments in the location-sensitive attention mechanism. By properly adjusting this regularization term we show that the loss curves become smoother, and at the same time Regotron consistently produces monotonic alignments in unseen examples even at an early stage (13\% of the total number of epochs) of its training process, whereas the fully converged Tacotron2 fails to do so. Moreover, our proposed regularization method has no additional computational overhead, while reducing common TTS mistakes and achieving slighlty improved speech naturalness according to subjective mean opinion scores (MOS) collected from 50 evaluators.
    Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications. (arXiv:2204.13502v1 [cs.LG])
    Multi-player multi-armed bandits (MMAB) study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards, both of which are usually too restrictive in practical scenarios. In this paper, we propose an MMAB with shareable resources as an extension to the collision and non-collision settings. Each shareable arm has finite shareable resources and a "per-load" reward random variable, both of which are unknown to players. The reward from a shareable arm is equal to the "per-load" reward multiplied by the minimum between the number of players pulling the arm and the arm's maximal shareable resources. We consider two types of feedback: sharing demand information (SDI) and sharing demand awareness (SDA), each of which provides different signals of resource sharing. We design the DPE-SDI and SIC-SDA algorithms to address the shareable arm problem under these two cases of feedback respectively and prove that both algorithms have logarithmic regrets that are tight in the number of rounds. We conduct simulations to validate both algorithms' performance and show their utilities in wireless networking and edge computing.
    Predicting batch queue job wait times for informed scheduling of urgent HPC workloads. (arXiv:2204.13543v1 [cs.DC])
    There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times. For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. We demonstrate that our techniques deliver the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within one minute of the actual start time for around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When compared against what Slurm can deliver, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for 90\% of jobs on Cirrus. Whilst the driver of this work has been to better facilitate placement of urgent workloads across HPC machines, the insights gained can be used to provide wider benefits to users and also enrich existing batch queue systems and inform policy too.
    BI-GreenNet: Learning Green's functions by boundary integral network. (arXiv:2204.13247v1 [cs.LG])
    Green's function plays a significant role in both theoretical analysis and numerical computing of partial differential equations (PDEs). However, in most cases, Green's function is difficult to compute. The troubles arise in the following three folds. Firstly, compared with the original PDE, the dimension of Green's function is doubled, making it impossible to be handled by traditional mesh-based methods. Secondly, Green's function usually contains singularities which increase the difficulty to get a good approximation. Lastly, the computational domain may be very complex or even unbounded. To override these problems, we leverage the fundamental solution, boundary integral method and neural networks to develop a new method for computing Green's function with high accuracy in this paper. We focus on Green's function of Poisson and Helmholtz equations in bounded domains, unbounded domains. We also consider Poisson equation and Helmholtz domains with interfaces. Extensive numerical experiments illustrate the efficiency and the accuracy of our method for solving Green's function. In addition, we also use the Green's function calculated by our method to solve a class of PDE, and also obtain high-precision solutions, which shows the good generalization ability of our method on solving PDEs.
    Deep graph matching meets mixed-integer linear programming: Relax at your own risk ?. (arXiv:2108.00394v5 [cs.CV] UPDATED)
    Graph matching is an important problem that has received widespread attention, especially in the field of computer vision. Recently, state-of-the-art methods seek to incorporate graph matching with deep learning. However, there is no research to explain what role the graph matching algorithm plays in the model. Therefore, we propose an approach integrating a MILP formulation of the graph matching problem. This formulation is solved to optimal and it provides inherent baseline. Meanwhile, similar approaches are derived by releasing the optimal guarantee of the graph matching solver and by introducing a quality level. This quality level controls the quality of the solutions provided by the graph matching solver. In addition, several relaxations of the graph matching problem are put to the test. Our experimental evaluation gives several theoretical insights and guides the direction of deep graph matching methods.
    Predicting single-cell perturbation responses for unseen drugs. (arXiv:2204.13545v1 [cs.LG])
    Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.
    Exploring How Anomalous Model Input and Output Alerts Affect Decision-Making in Healthcare. (arXiv:2204.13194v1 [cs.HC])
    An important goal in the field of human-AI interaction is to help users more appropriately trust AI systems' decisions. A situation in which the user may particularly benefit from more appropriate trust is when the AI receives anomalous input or provides anomalous output. To the best of our knowledge, this is the first work towards understanding how anomaly alerts may contribute to appropriate trust of AI. In a formative mixed-methods study with 4 radiologists and 4 other physicians, we explore how AI alerts for anomalous input, very high and low confidence, and anomalous saliency-map explanations affect users' experience with mockups of an AI clinical decision support system (CDSS) for evaluating chest x-rays for pneumonia. We find evidence suggesting that the four anomaly alerts are desired by non-radiologists, and the high-confidence alerts are desired by both radiologists and non-radiologists. In a follow-up user study, we investigate how high- and low-confidence alerts affect the accuracy and thus appropriate trust of 33 radiologists working with AI CDSS mockups. We observe that these alerts do not improve users' accuracy or experience and discuss potential reasons why.
    Machine Learning for Violence Risk Assessment Using Dutch Clinical Notes. (arXiv:2204.13535v1 [cs.LG])
    Violence risk assessment in psychiatric institutions enables interventions to avoid violence incidents. Clinical notes written by practitioners and available in electronic health records are valuable resources capturing unique information, but are seldom used to their full potential. We explore conventional and deep machine learning methods to assess violence risk in psychiatric patients using practitioner notes. The performance of our best models is comparable to the currently used questionnaire-based method, with an area under the Receiver Operating Characteristic curve of approximately 0.8. We find that the deep-learning model BERTje performs worse than conventional machine learning methods. We also evaluate our data and our classifiers to understand the performance of our models better. This is particularly important for the applicability of evaluated classifiers to new data, and is also of great interest to practitioners, due to the increased availability of new data in electronic format.
    Learning Storm Surge with Gradient Boosting. (arXiv:2204.13168v1 [cs.CE])
    Storm surge is a major natural hazard for coastal regions, responsible both for significant property damage and loss of life. Accurate, efficient models of storm surge are needed both to assess long-term risk and to guide emergency management decisions. While high-fidelity ocean circulation models such as the ADvanced CIRCulation (ADCIRC) model can accurately predict storm surge, they are very computationally expensive. Consequently, there have been a number of efforts in recent years to develop data-driven surrogate models for storm surge. While these models can attain good accuracy and are highly efficient, they are often limited to a small geographical region and a fixed set of output locations. We develop a novel surrogate model for peak storm surge prediction based on gradient boosting. Unlike most surrogate approaches, our model is not explicitly constrained to a fixed set of output locations or specific geographical region. The model is trained with a database of 446 synthetic storms that make landfall on the Texas coast and obtains a mean absolute error of 0.25 meters. We additionally present a test of the model on Hurricanes Ike (2008) and Harvey (2017).
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v1 [cs.LG])
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.
    Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity. (arXiv:2204.10872v2 [cs.LG] UPDATED)
    Plackett-Luce gradient estimation enables the optimization of stochastic ranking models within feasible time constraints through sampling techniques. Unfortunately, the computational complexity of existing methods does not scale well with the length of the rankings, i.e. the ranking cutoff, nor with the item collection size. In this paper, we introduce the novel PL-Rank-3 algorithm that performs unbiased gradient estimation with a computational complexity comparable to the best sorting algorithms. As a result, our novel learning-to-rank method is applicable in any scenario where standard sorting is feasible in reasonable time. Our experimental results indicate large gains in the time required for optimization, without any loss in performance. For the field, our contribution could potentially allow state-of-the-art learning-to-rank methods to be applied to much larger scales than previously feasible.
    Unsupervised Spatial-spectral Hyperspectral Image Reconstruction and Clustering with Diffusion Geometry. (arXiv:2204.13497v1 [cs.CV])
    Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images automatically. This work introduces the Spatial-Spectral Image Reconstruction and Clustering with Diffusion Geometry (DSIRC) algorithm for partitioning highly mixed hyperspectral images. DSIRC reduces measurement noise through a shape-adaptive reconstruction procedure. In particular, for each pixel, DSIRC locates spectrally correlated pixels within a data-adaptive spatial neighborhood and reconstructs that pixel's spectral signature using those of its neighbors. DSIRC then locates high-density, high-purity pixels far in diffusion distance (a data-dependent distance metric) from other high-density, high-purity pixels and treats these as cluster exemplars, giving each a unique label. Non-modal pixels are assigned the label of their diffusion distance-nearest neighbor of higher density and purity that is already labeled. Strong numerical results indicate that incorporating spatial information through image reconstruction substantially improves the performance of pixel-wise clustering.
    On the Normalizing Constant of the Continuous Categorical Distribution. (arXiv:2204.13290v1 [stat.ML])
    Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution. Our code is available at https://github.com/cunningham-lab/cb_and_cc/.
    Machine learning for knowledge acquisition and accelerated inverse-design for non-Hermitian systems. (arXiv:2204.13376v1 [physics.optics])
    Non-Hermitian systems offer new platforms for unusual physical properties that can be flexibly manipulated by redistribution of the real and imaginary parts of refractive indices, whose presence breaks conventional wave propagation symmetries, leading to asymmetric reflection and symmetric transmission with respect to the wave propagation direction. Here, we use supervised and unsupervised learning techniques for knowledge acquisition in non-Hermitian systems which accelerate the inverse design process. In particular, we construct a deep learning model that relates the transmission and asymmetric reflection in non-conservative settings and proposes sub-manifold learning to recognize non-Hermitian features from transmission spectra. The developed deep learning framework determines the feasibility of a desired spectral response for a given structure and uncovers the role of effective gain-loss parameters to tailor the spectral response. These findings pave the way for intelligent inverse design and shape our understanding of the physical mechanism in general non-Hermitian systems.
    MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios. (arXiv:2112.13753v5 [cs.LG] UPDATED)
    Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV.
    Music Enhancement via Image Translation and Vocoding. (arXiv:2204.13289v1 [cs.SD])
    Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ. This paper presents a deep learning approach to enhance low-quality music recordings by combining (i) an image-to-image translation model for manipulating audio in its mel-spectrogram representation and (ii) a music vocoding model for mapping synthetically generated mel-spectrograms to perceptually realistic waveforms. We find that this approach to music enhancement outperforms baselines which use classical methods for mel-spectrogram inversion and an end-to-end approach directly mapping noisy waveforms to clean waveforms. Additionally, in evaluating the proposed method with a listening test, we analyze the reliability of common audio enhancement evaluation metrics when used in the music domain.
    Control-Aware Prediction Objectives for Autonomous Driving. (arXiv:2204.13319v1 [cs.LG])
    Autonomous vehicle software is typically structured as a modular pipeline of individual components (e.g., perception, prediction, and planning) to help separate concerns into interpretable sub-tasks. Even when end-to-end training is possible, each module has its own set of objectives used for safety assurance, sample efficiency, regularization, or interpretability. However, intermediate objectives do not always align with overall system performance. For example, optimizing the likelihood of a trajectory prediction module might focus more on easy-to-predict agents than safety-critical or rare behaviors (e.g., jaywalking). In this paper, we present control-aware prediction objectives (CAPOs), to evaluate the downstream effect of predictions on control without requiring the planner be differentiable. We propose two types of importance weights that weight the predictive likelihood: one using an attention model between agents, and another based on control variation when exchanging predicted trajectories for ground truth trajectories. Experimentally, we show our objectives improve overall system performance in suburban driving scenarios using the CARLA simulator.
    Partitioned Variational Inference: A Framework for Probabilistic Federated Learning. (arXiv:2202.12275v4 [stat.ML] UPDATED)
    The proliferation of computing devices has brought about an opportunity to deploy machine learning models on new problem domains using previously inaccessible data. Traditional algorithms for training such models often require data to be stored on a single machine with compute performed by a single node, making them unsuitable for decentralised training on multiple devices. This deficiency has motivated the development of federated learning algorithms, which allow multiple data owners to train collaboratively and use a shared model whilst keeping local data private. However, many of these algorithms focus on obtaining point estimates of model parameters, rather than probabilistic estimates capable of capturing model uncertainty, which is essential in many applications. Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. In this paper we introduce partitioned variational inference (PVI), a general framework for performing VI in the federated setting. We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners; use PVI to unify a wealth of fragmented, yet related literature; and provide empirical results that showcase the effectiveness of PVI in a variety of federated settings.
    UNBUS: Uncertainty-aware Deep Botnet Detection System in Presence of Perturbed Samples. (arXiv:2204.09502v2 [cs.CR] UPDATED)
    A rising number of botnet families have been successfully detected using deep learning architectures. While the variety of attacks increases, these architectures should become more robust against attacks. They have been proven to be very sensitive to small but well constructed perturbations in the input. Botnet detection requires extremely low false-positive rates (FPR), which are not commonly attainable in contemporary deep learning. Attackers try to increase the FPRs by making poisoned samples. The majority of recent research has focused on the use of model loss functions to build adversarial examples and robust models. In this paper, two LSTM-based classification algorithms for botnet classification with an accuracy higher than 98% are presented. Then, the adversarial attack is proposed, which reduces the accuracy to about 30%. Then, by examining the methods for computing the uncertainty, the defense method is proposed to increase the accuracy to about 70%. By using the deep ensemble and stochastic weight averaging quantification methods it has been investigated the uncertainty of the accuracy in the proposed methods.
    Actor-Critic Scheduling for Path-Aware Air-to-Ground Multipath Multimedia Delivery. (arXiv:2204.13343v1 [cs.NI])
    Reinforcement Learning (RL) has recently found wide applications in network traffic management and control because some of its variants do not require prior knowledge of network models. In this paper, we present a novel scheduler for real-time multimedia delivery in multipath systems based on an Actor-Critic (AC) RL algorithm. We focus on a challenging scenario of real-time video streaming from an Unmanned Aerial Vehicle (UAV) using multiple wireless paths. The scheduler acting as an RL agent learns in real-time the optimal policy for path selection, path rate allocation and redundancy estimation for flow protection. The scheduler, implemented as a module of the GStreamer framework, can be used in real or simulated settings. The simulation results show that our scheduler can target a very low loss rate at the receiver by dynamically adapting in real-time the scheduling policy to the path conditions without performing training or relying on prior knowledge of network channel models.
    ELM: Embedding and Logit Margins for Long-Tail Learning. (arXiv:2204.13208v1 [cs.LG])
    Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural models, such techniques do not explicitly control the geometry of the learned embeddings. This can be potentially sub-optimal, since embeddings for tail classes may be diffuse, resulting in poor generalization for these classes. We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space, and regularize the distribution of embeddings. This connects losses for long-tail learning to proposals in the literature on metric embedding, and contrastive learning. We theoretically show that minimising the proposed ELM objective helps reduce the generalisation gap. The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.
    Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation. (arXiv:2204.13170v1 [cs.LG])
    In Federated Learning a number of clients collaborate to train a model without sharing their data. Client models are optimized locally and are communicated through a central hub called server. A major challenge is to deal with heterogeneity among clients' data which causes the local optimization to drift away with respect to the global objective. In order to estimate and therefore remove this drift, variance reduction techniques have been incorporated into Federated Learning optimization recently. However, the existing solutions propagate the error of their estimations, throughout the optimization trajectory which leads to inaccurate approximations of the clients' drift and ultimately failure to remove them properly. In this paper, we address this issue by introducing an adaptive algorithm that efficiently reduces clients' drift. Compared to the previous works on adapting variance reduction to Federated Learning, our approach uses less or the same level of communication bandwidth, computation or memory. Additionally, it addresses the instability problem--prevalent in prior work, caused by increasing norm of the estimates which makes our approach a much more practical solution for large scale Federated Learning settings. Our experimental results demonstrate that the proposed algorithm converges significantly faster and achieves higher accuracy compared to the baselines in an extensive set of Federated Learning benchmarks.
    Continual Learning with Bayesian Model based on a Fixed Pre-trained Feature Extractor. (arXiv:2204.13349v1 [cs.LG])
    Deep learning has shown its human-level performance in various applications. However, current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge particularly in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning over time. Unlike existing class-incremental learning methods, the proposed approach is not sensitive to the continual learning process and can be additionally well applied to the data-incremental learning scenario. Experiments on multiple medical and natural image classification tasks showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.
    AutoLossGen: Automatic Loss Function Generation for Recommender Systems. (arXiv:2204.13160v1 [cs.IR])
    In recommendation systems, the choice of loss function is critical since a good loss may significantly improve the model performance. However, manually designing a good loss is a big challenge due to the complexity of the problem. A large fraction of previous work focuses on handcrafted loss functions, which needs significant expertise and human effort. In this paper, inspired by the recent development of automated machine learning, we propose an automatic loss function generation framework, AutoLossGen, which is able to generate loss functions directly constructed from basic mathematical operators without prior knowledge on loss structure. More specifically, we develop a controller model driven by reinforcement learning to generate loss functions, and develop iterative and alternating optimization schedule to update the parameters of both the controller model and the recommender model. One challenge for automatic loss generation in recommender systems is the extreme sparsity of recommendation datasets, which leads to the sparse reward problem for loss generation and search. To solve the problem, we further develop a reward filtering mechanism for efficient and effective loss generation. Experimental results show that our framework manages to create tailored loss functions for different recommendation models and datasets, and the generated loss gives better recommendation performance than commonly used baseline losses. Besides, most of the generated losses are transferable, i.e., the loss generated based on one model and dataset also works well for another model or dataset. Source code of the work is available at https://github.com/rutgerswiselab/AutoLossGen.
    SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning. (arXiv:2203.13060v2 [cs.IT] UPDATED)
    We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$, and per-user communication load of up to $(1+\frac{T+D}{K})L$, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.
    Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces. (arXiv:2202.10613v3 [stat.ML] UPDATED)
    Bayesian learning using Gaussian processes provides a foundational framework for making decisions in a manner that balances what is known with what could be learned by gathering data. In this dissertation, we develop techniques for broadening the applicability of Gaussian processes. This is done in two ways. Firstly, we develop pathwise conditioning techniques for Gaussian processes, which allow one to express posterior random functions as prior random functions plus a dependent update term. We introduce a wide class of efficient approximations built from this viewpoint, which can be randomly sampled once in advance, and evaluated at arbitrary locations without any subsequent stochasticity. This key property improves efficiency and makes it simpler to deploy Gaussian process models in decision-making settings. Secondly, we develop a collection of Gaussian process models over non-Euclidean spaces, including Riemannian manifolds and graphs. We derive fully constructive expressions for the covariance kernels of scalar-valued Gaussian processes on Riemannian manifolds and graphs. Building on these ideas, we describe a formalism for defining vector-valued Gaussian processes on Riemannian manifolds. The introduced techniques allow all of these models to be trained using standard computational methods. In total, these contributions make Gaussian processes easier to work with and allow them to be used within a wider class of domains in an effective and principled manner. This, in turn, makes it possible to potentially apply Gaussian processes to novel decision-making settings.
    Model-Based Safe Policy Search from Signal Temporal Logic Specifications Using Recurrent Neural Networks. (arXiv:2103.15938v2 [eess.SY] UPDATED)
    We propose a policy search approach to learn controllers from specifications given as Signal Temporal Logic (STL) formulae. The system model, which is unknown but assumed to be an affine control system, is learned together with the control policy. The model is implemented as two feedforward neural networks (FNNs) - one for the drift, and one for the control directions. To capture the history dependency of STL specifications, we use a recurrent neural network (RNN) to implement the control policy. In contrast to prevalent model-free methods, the learning approach proposed here takes advantage of the learned model and is more efficient. We use control barrier functions (CBFs) with the learned model to improve the safety of the system. We validate our algorithm via simulations and experiments. The results show that our approach can satisfy the given specification within very few system runs, and can be used for on-line control.
    Compositional Federated Learning for Distributionally Robust and Meta Learning. (arXiv:2106.11264v2 [cs.LG] UPDATED)
    In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many data mining and machine learning problems with a hierarchical structure such as distributionally robust FL and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a convergence rate of $O(\frac{1}{\sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our new Compositional FL framework is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple yet effective composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.
    ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution. (arXiv:2101.07415v5 [cs.LG] UPDATED)
    In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and empirically, which thus limits their scope over hybrid search spaces as well. In order to combat this curse, we propose ES-ENAS, a simple and modular joint optimization procedure combining the class of sample-efficient smoothed gradient gradient techniques, commonly known as Evolutionary Strategies (ES), with combinatorial optimizers in a highly scalable and intuitive way, inspired by the one-shot or supernet paradigm introduced in Efficient Neural Architecture Search (ENAS). By doing so, we achieve significantly more sample efficiency, which we empirically demonstrate over synthetic benchmarks, and are further able to apply ES-ENAS for architecture search over popular RL benchmarks.
    Classifier Calibration: with application to threat scores in cybersecurity. (arXiv:2102.05143v3 [cs.LG] UPDATED)
    This paper explores the calibration of a classifier output score in binary classification problems. A calibrator is a function that maps the arbitrary classifier score, of a testing observation, onto $[0,1]$ to provide an estimate for the posterior probability of belonging to one of the two classes. Calibration is important for two reasons; first, it provides a meaningful score, that is the posterior probability; second, it puts the scores of different classifiers on the same scale for comparable interpretation. The paper presents three main contributions: (1) Introducing multi-score calibration, when more than one classifier provides a score for a single observation. (2) Introducing the idea that the classifier scores to a calibration process are nothing but features to a classifier, hence proposing expanding the classifier scores to higher dimensions to boost the calibrator's performance. (3) Conducting a massive simulation study, in the order of 24,000 experiments, that incorporates different configurations, in addition to experimenting on two real datasets from the cybersecurity domain. The results show that there is no overall winner among the different calibrators and different configurations. However, general advices for practitioners include the following: the Platt's calibrator~\citep{Platt1999ProbabilisticOutputsForSupport}, a version of the logistic regression that decreases bias for a small sample size, has a very stable and acceptable performance among all experiments; our suggested multi-score calibration provides better performance than single score calibration in the majority of experiments, including the two real datasets. In addition, expanding the scores can help in some experiments.
    Consistent Relative Confidence and Label-Free Model Selection for Convolutional Neural Networks. (arXiv:2108.11845v7 [cs.CV] UPDATED)
    This letter is concerned with image classification with deep convolutional neural networks (CNNs). The focus is on the following question: given a set of candidate CNN models, how to select the right one with the best generalization property for the current task? Present model selection methods require access to a batch of labeled data for computing a pre-specified performance metric, such as the cross-entropy loss, the classification error rate, the negative log-likelihood. In many practical cases, labels are not available in time as labeling itself is a time-consuming and expensive task. To this end, this letter presents an approach to CNN model selection using only unlabeled data. This method is developed based on a principle termed consistent relative confidence. The effectiveness and efficiency of the proposed method are demonstrated by experiments using benchmark datasets.
    Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems. (arXiv:2007.03278v3 [cs.LG] UPDATED)
    Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks. In this regard, democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. The outlined principles are meant to study a generalization in distributed learning systems that goes beyond existing mechanisms such as federated learning. Moreover, such learning systems rely on hierarchical self-organization of well-connected distributed learning agents who have limited and highly personalized data and can evolve and regulate themselves based on the underlying duality of specialized and generalized processes. Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper. The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning mechanism. Subsequently, hierarchical generalized learning problems in recursive forms are formulated and shown to be approximately solved using the solutions of distributed personalized learning problems and hierarchical update mechanisms. To that end, a distributed learning algorithm, namely DemLearn is proposed. Extensive experiments on benchmark MNIST, Fashion-MNIST, FE-MNIST, and CIFAR-10 datasets show that the proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms. The detailed analysis provides useful observations to further handle both the generalization and specialization performance of the learning models in Dem-AI systems.
    Quality Inference in Federated Learning with Secure Aggregation. (arXiv:2007.06236v3 [cs.LG] UPDATED)
    Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to detect misbehaviours, to stabilize training performance, and to measure the individual contributions of participants.
    Computer Vision for Road Imaging and Pothole Detection: A State-of-the-Art Review of Systems and Algorithms. (arXiv:2204.13590v1 [cs.CV])
    Computer vision algorithms have been prevalently utilized for 3-D road imaging and pothole detection for over two decades. Nonetheless, there is a lack of systematic survey articles on state-of-the-art (SoTA) computer vision techniques, especially deep learning models, developed to tackle these problems. This article first introduces the sensing systems employed for 2-D and 3-D road data acquisition, including camera(s), laser scanners, and Microsoft Kinect. Afterward, it thoroughly and comprehensively reviews the SoTA computer vision algorithms, including (1) classical 2-D image processing, (2) 3-D point cloud modeling and segmentation, and (3) machine/deep learning, developed for road pothole detection. This article also discusses the existing challenges and future development trends of computer vision-based road pothole detection approaches: classical 2-D image processing-based and 3-D point cloud modeling and segmentation-based approaches have already become history; and Convolutional neural networks (CNNs) have demonstrated compelling road pothole detection results and are promising to break the bottleneck with the future advances in self/un-supervised learning for multi-modal semantic segmentation. We believe that this survey can serve as practical guidance for developing the next-generation road condition assessment systems.
    Russian Texts Detoxification with Levenshtein Editing. (arXiv:2204.13638v1 [cs.CL])
    Text detoxification is a style transfer task of creating neutral versions of toxic texts. In this paper, we use the concept of text editing to build a two-step tagging-based detoxification model using a parallel corpus of Russian texts. With this model, we achieved the best style transfer accuracy among all models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence models.
    Nonbacktracking spectral clustering of nonuniform hypergraphs. (arXiv:2204.13586v1 [cs.SI])
    Spectral methods offer a tractable, global framework for clustering in graphs via eigenvector computations on graph matrices. Hypergraph data, in which entities interact on edges of arbitrary size, poses challenges for matrix representations and therefore for spectral clustering. We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator. After reviewing the definition of this operator and its basic properties, we prove a theorem of Ihara-Bass type to enable faster computation of eigenpairs. We then propose an alternating algorithm for inference in a hypergraph stochastic blockmodel via linearized belief-propagation, offering proofs that both formalize and extend several previous results. We perform experiments in real and synthetic data that underscore the benefits of hypergraph methods over graph-based ones when interactions of different sizes carry different information about cluster structure. Through an analysis of our algorithm, we pose several conjectures about the limits of spectral methods and detectability in hypergraph stochastic blockmodels writ large.
    PhysioGAN: Training High Fidelity Generative Model for Physiological Sensor Readings. (arXiv:2204.13597v1 [eess.SP])
    Generative models such as the variational autoencoder (VAE) and the generative adversarial networks (GAN) have proven to be incredibly powerful for the generation of synthetic data that preserves statistical properties and utility of real-world datasets, especially in the context of image and natural language text. Nevertheless, until now, there has no successful demonstration of how to apply either method for generating useful physiological sensory data. The state-of-the-art techniques in this context have achieved only limited success. We present PHYSIOGAN, a generative model to produce high fidelity synthetic physiological sensor data readings. PHYSIOGAN consists of an encoder, decoder, and a discriminator. We evaluate PHYSIOGAN against the state-of-the-art techniques using two different real-world datasets: ECG classification and activity recognition from motion sensors datasets. We compare PHYSIOGAN to the baseline models not only the accuracy of class conditional generation but also the sample diversity and sample novelty of the synthetic datasets. We prove that PHYSIOGAN generates samples with higher utility than other generative models by showing that classification models trained on only synthetic data generated by PHYSIOGAN have only 10% and 20% decrease in their classification accuracy relative to classification models trained on the real data. Furthermore, we demonstrate the use of PHYSIOGAN for sensor data imputation in creating plausible results.
    Personalized Federated Learning with Multiple Known Clusters. (arXiv:2204.13619v1 [cs.LG])
    We consider the problem of personalized federated learning when there are known cluster structures within users. An intuitive approach would be to regularize the parameters so that users in the same cluster share similar model weights. The distances between the clusters can then be regularized to reflect the similarity between different clusters of users. We develop an algorithm that allows each cluster to communicate independently and derive the convergence results. We study a hierarchical linear model to theoretically demonstrate that our approach outperforms agents learning independently and agents learning a single shared weight. Finally, we demonstrate the advantages of our approach using both simulated and real-world data.
    Zero-Shot Logit Adjustment. (arXiv:2204.11822v2 [cs.CV] UPDATED)
    Semantic-descriptor-based Generalized Zero-Shot Learning (GZSL) poses challenges in recognizing novel classes in the test phase. The development of generative models enables current GZSL techniques to probe further into the semantic-visual link, culminating in a two-stage form that includes a generator and a classifier. However, existing generation-based methods focus on enhancing the generator's effect while neglecting the improvement of the classifier. In this paper, we first analyze of two properties of the generated pseudo unseen samples: bias and homogeneity. Then, we perform variational Bayesian inference to back-derive the evaluation metrics, which reflects the balance of the seen and unseen classes. As a consequence of our derivation, the aforementioned two properties are incorporated into the classifier training as seen-unseen priors via logit adjustment. The Zero-Shot Logit Adjustment further puts semantic-based classifiers into effect in generation-based GZSL. Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative zero-shot learning frameworks. Our codes are available on https://github.com/cdb342/IJCAI-2022-ZLA.
    Continual Learning for Peer-to-Peer Federated Learning: A Study on Automated Brain Metastasis Identification. (arXiv:2204.13591v1 [cs.LG])
    Due to data privacy constraints, data sharing among multiple centers is restricted. Continual learning, as one approach to peer-to-peer federated learning, can promote multicenter collaboration on deep learning algorithm development by sharing intermediate models instead of training data. This work aims to investigate the feasibility of continual learning for multicenter collaboration on an exemplary application of brain metastasis identification using DeepMedic. 920 T1 MRI contrast enhanced volumes are split to simulate multicenter collaboration scenarios. A continual learning algorithm, synaptic intelligence (SI), is applied to preserve important model weights for training one center after another. In a bilateral collaboration scenario, continual learning with SI achieves a sensitivity of 0.917, and naive continual learning without SI achieves a sensitivity of 0.906, while two models trained on internal data solely without continual learning achieve sensitivity of 0.853 and 0.831 only. In a seven-center multilateral collaboration scenario, the models trained on internal datasets (100 volumes each center) without continual learning obtain a mean sensitivity value of 0.725. With single-visit continual learning (i.e., the shared model visits each center only once during training), the sensitivity is improved to 0.788 and 0.849 without SI and with SI, respectively. With iterative continual learning (i.e., the shared model revisits each center multiple times during training), the sensitivity is further improved to 0.914, which is identical to the sensitivity using mixed data for training. Our experiments demonstrate that continual learning can improve brain metastasis identification performance for centers with limited data. This study demonstrates the feasibility of applying continual learning for peer-to-peer federated learning in multicenter collaboration.
    On tuning a mean-field model for semi-supervised classification. (arXiv:2204.13519v1 [cs.LG])
    Semi-supervised learning (SSL) has become an interesting research area due to its capacity for learning in scenarios where both labeled and unlabeled data are available. In this work, we focus on the task of transduction - when the objective is to label all data presented to the learner - with a mean-field approximation to the Potts model. Aiming at this particular task we study how classification results depend on $\beta$ and find that the optimal phase depends highly on the amount of labeled data available. In the same study, we also observe that more stable classifications regarding small fluctuations in $\beta$ are related to configurations of high probability and propose a tuning approach based on such observation. This method relies on a novel parameter $\gamma$ and we then evaluate two different values of the said quantity in comparison with classical methods in the field. This evaluation is conducted by changing the amount of labeled data available and the number of nearest neighbors in the similarity graph. Empirical results show that the tuning method is effective and allows NMF to outperform other approaches in datasets with fewer classes. In addition, one of the chosen values for $\gamma$ also leads to results that are more resilient to changes in the number of neighbors, which might be of interest to practitioners in the field of SSL.
    EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification. (arXiv:2204.13496v1 [cs.CL])
    Knowledge-based authentication is crucial for task-oriented spoken dialogue systems that offer personalised and privacy-focused services. Such systems should be able to enrol (E), verify (V), and identify (I) new and recurring users based on their personal information, e.g. postcode, name, and date of birth. In this work, we formalise the three authentication tasks and their evaluation protocols, and we present EVI, a challenging spoken multilingual dataset with 5,506 dialogues in English, Polish, and French. Our proposed models set the first competitive benchmarks, explore the challenges of multilingual natural language processing of spoken dialogue, and set directions for future research.
    Learning General Inventory Management Policy for Large Supply Chain Network. (arXiv:2204.13378v1 [cs.AI])
    Inventory management in warehouses directly affects profits made by manufacturers. Particularly, large manufacturers produce a very large variety of products that are handled by a significantly large number of retailers. In such a case, the computational complexity of classical inventory management algorithms is inordinately large. In recent years, learning-based approaches have become popular for addressing such problems. However, previous studies have not been managed systems where both the number of products and retailers are large. This study proposes a reinforcement learning-based warehouse inventory management algorithm that can be used for supply chain systems where both the number of products and retailers are large. To solve the computational problem of handling large systems, we provide a means of approximate simulation of the system in the training phase. Our experiments on both real and artificial data demonstrate that our algorithm with approximated simulation can successfully handle large supply chain networks.
    Mixup-based Deep Metric Learning Approaches for Incomplete Supervision. (arXiv:2204.13572v1 [cs.LG])
    Deep learning architectures have achieved promising results in different areas (e.g., medicine, agriculture, and security). However, using those powerful techniques in many real applications becomes challenging due to the large labeled collections required during training. Several works have pursued solutions to overcome it by proposing strategies that can learn more for less, e.g., weakly and semi-supervised learning approaches. As these approaches do not usually address memorization and sensitivity to adversarial examples, this paper presents three deep metric learning approaches combined with Mixup for incomplete-supervision scenarios. We show that some state-of-the-art approaches in metric learning might not work well in such scenarios. Moreover, the proposed approaches outperform most of them in different datasets.
    Adversarial Fine-tune with Dynamically Regulated Adversary. (arXiv:2204.13232v1 [cs.LG])
    Adversarial training is an effective method to boost model robustness to malicious, adversarial attacks. However, such improvement in model robustness often leads to a significant sacrifice of standard performance on clean images. In many real-world applications such as health diagnosis and autonomous surgical robotics, the standard performance is more valued over model robustness against such extremely malicious attacks. This leads to the question: To what extent we can boost model robustness without sacrificing standard performance? This work tackles this problem and proposes a simple yet effective transfer learning-based adversarial training strategy that disentangles the negative effects of adversarial samples on model's standard performance. In addition, we introduce a training-friendly adversarial attack algorithm, which facilitates the boost of adversarial robustness without introducing significant training complexity. Extensive experimentation indicates that the proposed method outperforms previous adversarial training algorithms towards the target: to improve model robustness while preserving model's standard performance on clean data.
    Semantic Communication: An Information Bottleneck View. (arXiv:2204.13366v1 [cs.IT])
    Motivated by recent success of machine learning tools at the PHY layer and driven by high bandwidth demands of the next wireless communication standard 6G, the old idea of semantic communication by Weaver from 1949 has received considerable attention. It breaks with the classic design paradigm according to Shannon by aiming to transmit the meaning of a message rather than its exact copy and thus potentially allows for savings in bandwidth. In this work, inspired by Weaver, we propose an information-theoretic framework where the semantic context is explicitly introduced into probabilistic models. In particular, for bandwidth efficient transmission, we define semantic communication system design as an Information Bottleneck optimization problem and consider important implementation aspects. Further, we uncover the restrictions of the classic 5G communication system design w.r.t. semantic context. Notably, based on the example of distributed image classification, we reveal the huge potential of a semantic communication system design. Numerical results show a tremendous saving in bandwidth of 20 dB with our proposed approach ISCNet compared to a classic PHY layer design.
    WeaNF: Weak Supervision with Normalizing Flows. (arXiv:2204.13409v1 [cs.CL])
    A popular approach to decrease the need for costly manual annotation of large data sets is weak supervision, which introduces problems of noisy labels, coverage and bias. Methods for overcoming these problems have either relied on discriminative models, trained with cost functions specific to weak supervision, and more recently, generative models, trying to model the output of the automatic annotation process. In this work, we explore a novel direction of generative modeling for weak supervision: Instead of modeling the output of the annotation process (the labeling function matches), we generatively model the input-side data distributions (the feature space) covered by labeling functions. Specifically, we estimate a density for each weak labeling source, or labeling function, by using normalizing flows. An integral part of our method is the flow-based modeling of multiple simultaneously matching labeling functions, and therefore phenomena such as labeling function overlap and correlations are captured. We analyze the effectiveness and modeling capabilities on various commonly used weak supervision data sets, and show that weakly supervised normalizing flows compare favorably to standard weak supervision baselines.
    Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. (arXiv:2108.06325v2 [cs.LG] UPDATED)
    The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former. We show that in continual learning setups, Backprop performs well initially, but over time its performance degrades. Stochastic gradient descent alone is insufficient to learn continually; the initial randomness enables only initial learning but not continual learning. To the best of our knowledge, ours is the first result showing this degradation in Backprop's ability to learn. To address this degradation in Backprop's plasticity, we propose an algorithm that continually injects random features alongside gradient descent using a new generate-and-test process. We call this the \textit{Continual Backprop} algorithm. We show that, unlike Backprop, Continual Backprop is able to continually adapt in both supervised and reinforcement learning (RL) problems. Continual Backprop has the same computational complexity as Backprop and can be seen as a natural extension of Backprop for continual learning.
    Schr\"odinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training. (arXiv:2204.13666v1 [cs.LG])
    We introduce a software-hardware co-design approach to reduce memory traffic and footprint during training with BFloat16 or FP32 boosting energy efficiency and execution time performance. We introduce methods to dynamically adjust the size and format of the floating-point containers used to store activations and weights during training. The different value distributions lead us to different approaches for exponents and mantissas. Gecko exploits the favourable exponent distribution with a loss-less delta encoding approach to reduce the total exponent footprint by up to $58\%$ in comparison to a 32 bit floating point baseline. To content with the noisy mantissa distributions, we present two lossy methods to eliminate as many as possible least significant bits while not affecting accuracy. Quantum Mantissa, is a machine learning-first mantissa compression method that taps on training's gradient descent algorithm to also learn minimal mantissa bitlengths on a per-layer granularity, and obtain up to $92\%$ reduction in total mantissa footprint. Alternatively, BitChop observes changes in the loss function during training to adjust mantissa bit-length network-wide yielding a reduction of $81\%$ in footprint. Schr\"{o}dinger's FP implements hardware encoders/decoders that guided by Gecko/Quantum Mantissa or Gecko/BitChop transparently encode/decode values when transferring to/from off-chip memory boosting energy efficiency and reducing execution time.
    Interpretable Graph Convolutional Network of Multi-Modality Brain Imaging for Alzheimer's Disease Diagnosis. (arXiv:2204.13188v1 [cs.LG])
    Identification of brain regions related to the specific neurological disorders are of great importance for biomarker and diagnostic studies. In this paper, we propose an interpretable Graph Convolutional Network (GCN) framework for the identification and classification of Alzheimer's disease (AD) using multi-modality brain imaging data. Specifically, we extended the Gradient Class Activation Mapping (Grad-CAM) technique to quantify the most discriminative features identified by GCN from brain connectivity patterns. We then utilized them to find signature regions of interest (ROIs) by detecting the difference of features between regions in healthy control (HC), mild cognitive impairment (MCI), and AD groups. We conducted the experiments on the ADNI database with imaging data from three modalities, including VBM-MRI, FDG-PET, and AV45-PET, and showed that the ROI features learned by our method were effective for enhancing the performances of both clinical score prediction and disease status identification. It also successfully identified biomarkers associated with AD and MCI.  ( 2 min )
    BAGNet: Bidirectional Aware Guidance Network for Malignant Breast lesions Segmentation. (arXiv:2204.13342v1 [eess.IV])
    Breast lesions segmentation is an important step of computer-aided diagnosis system, and it has attracted much attention. However, accurate segmentation of malignant breast lesions is a challenging task due to the effects of heterogeneous structure and similar intensity distributions. In this paper, a novel bidirectional aware guidance network (BAGNet) is proposed to segment the malignant lesion from breast ultrasound images. Specifically, the bidirectional aware guidance network is used to capture the context between global (low-level) and local (high-level) features from the input coarse saliency map. The introduction of the global feature map can reduce the interference of surrounding tissue (background) on the lesion regions. To evaluate the segmentation performance of the network, we compared with several state-of-the-art medical image segmentation methods on the public breast ultrasound dataset using six commonly used evaluation metrics. Extensive experimental results indicate that our method achieves the most competitive segmentation results on malignant breast ultrasound images.  ( 2 min )
    Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers. (arXiv:2204.13326v1 [cs.LG])
    Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest.  ( 2 min )
    Offline Visual Representation Learning for Embodied Navigation. (arXiv:2204.13226v1 [cs.CV])
    How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g. predicting the action taken between two successive observations). In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules. We call this method Offline Visual Representation Learning (OVRL). We conduct large-scale experiments - on 3 different 3D datasets (Gibson, HM3D, MP3D), 2 tasks (ImageNav, ObjectNav), and 2 policy learning algorithms (RL, IL) - and find that the OVRL representations lead to significant across-the-board improvements in state of art, on ImageNav from 29.2% to 54.2% (+25% absolute, 86% relative) and on ObjectNav from 18.1% to 23.2% (+5.1% absolute, 28% relative). Importantly, both results were achieved by the same visual encoder generalizing to datasets that were not seen during pretraining. While the benefits of pretraining sometimes diminish (or entirely disappear) with long finetuning schedules, we find that OVRL's performance gains continue to increase (not decrease) as the agent is trained for 2 billion frames of experience.  ( 2 min )
    Anomaly Detection by Leveraging Incomplete Anomalous Knowledge with Anomaly-Aware Bidirectional GANs. (arXiv:2204.13335v1 [cs.LG])
    The goal of anomaly detection is to identify anomalous samples from normal ones. In this paper, a small number of anomalies are assumed to be available at the training stage, but they are assumed to be collected only from several anomaly types, leaving the majority of anomaly types not represented in the collected anomaly dataset at all. To effectively leverage this kind of incomplete anomalous knowledge represented by the collected anomalies, we propose to learn a probability distribution that can not only model the normal samples, but also guarantee to assign low density values for the collected anomalies. To this end, an anomaly-aware generative adversarial network (GAN) is developed, which, in addition to modeling the normal samples as most GANs do, is able to explicitly avoid assigning probabilities for collected anomalous samples. Moreover, to facilitate the computation of anomaly detection criteria like reconstruction error, the proposed anomaly-aware GAN is designed to be bidirectional, attaching an encoder for the generator. Extensive experimental results demonstrate that our proposed method is able to effectively make use of the incomplete anomalous information, leading to significant performance gains compared to existing methods.  ( 2 min )
    TransHER: Translating Knowledge Graph Embedding with Hyper-Ellipsoidal Restriction. (arXiv:2204.13221v1 [cs.AI])
    Knowledge graph embedding methods are important for knowledge graph completion (link prediction) due to their robust performance and efficiency on large-magnitude datasets. One state-of-the-art method, PairRE, leverages two separate vectors for relations to model complex relations (i.e., 1-to-N, N-to-1, and N-to-N) in knowledge graphs. However, such a method strictly restricts entities on the hyper-ellipsoid surface and thus limits the optimization of entity distribution, which largely hinders the performance of knowledge graph completion. To address this problem, we propose a novel score function TransHER, which leverages relation-specific translations between head and tail entities restricted on separate hyper-ellipsoids. Specifically, given a triplet, our model first maps entities onto two separate hyper-ellipsoids and then conducts a relation-specific translation on one of them. The relation-specific translation provides TransHER with more direct guidance in optimization and the ability to learn semantic characteristics of entities with complex relations. Experimental results show that TransHER can achieve state-of-the-art performance and generalize to datasets in different domains and scales. All our code will be publicly available.  ( 2 min )
    Open challenges for Machine Learning based Early Decision-Making research. (arXiv:2204.13111v1 [cs.LG])
    More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularly studied in the field of Early Time Series Classification. This paper introduces a more general problem, called Machine Learning based Early Decision Making (ML-EDM), which consists in optimizing the decision times of models in a wide range of settings where data is collected over time. After defining the ML-EDM problem, ten challenges are identified and proposed to the scientific community to further research in this area. These challenges open important application perspectives, discussed in this paper.  ( 2 min )
    Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation. (arXiv:2204.13263v1 [cs.LG])
    The accuracy of deep neural networks is degraded when the distribution of features in the test environment (target domain) differs from that of the training (source) environment. To mitigate the degradation, test-time adaptation (TTA), where a model adapts to the target domain without access to the source dataset, can be used in the test environment. However, the existing TTA methods lack feature distribution alignment between the source and target domains, which unsupervised domain adaptation mainly addresses, because accessing the source dataset is prohibited in the TTA setting. In this paper, we propose a novel TTA method, named Covariance-Aware Feature alignment (CAFe), which explicitly aligns the source and target feature distributions at test time. To perform alignment without accessing the source data, CAFe uses auxiliary feature statistics (mean and covariance) pre-computed on the source domain, which are lightweight and easily prepared. Further, to improve efficiency and stability, we propose feature grouping, which splits the feature dimensions into groups according to their correlations by using spectral clustering to avoid degeneration of the covariance matrix. We empirically show that CAFe outperforms prior TTA methods on a variety of distribution shifts.  ( 2 min )
    Counterfactual Explanations for Natural Language Interfaces. (arXiv:2204.13192v1 [cs.CL])
    A key challenge facing natural language interfaces is enabling users to understand the capabilities of the underlying system. We propose a novel approach for generating explanations of a natural language interface based on semantic parsing. We focus on counterfactual explanations, which are post-hoc explanations that describe to the user how they could have minimally modified their utterance to achieve their desired goal. In particular, the user provides an utterance along with a demonstration of their desired goal; then, our algorithm synthesizes a paraphrase of their utterance that is guaranteed to achieve their goal. In two user studies, we demonstrate that our approach substantially improves user performance, and that it generates explanations that more closely match the user's intent compared to two ablations.  ( 2 min )
    On the Convergence of Momentum-Based Algorithms for Federated Stochastic Bilevel Optimization Problems. (arXiv:2204.13299v1 [cs.LG])
    In this paper, we studied the federated stochastic bilevel optimization problem. In particular, we developed two momentum-based algorithms for optimizing this kind of problem. In addition, we established the convergence rate of these two algorithms, providing their sample and communication complexities. To the best of our knowledge, this is the first work achieving such favorable theoretical results.  ( 2 min )
    Watts: Infrastructure for Open-Ended Learning. (arXiv:2204.13250v1 [cs.AI])
    This paper proposes a framework called Watts for implementing, comparing, and recombining open-ended learning (OEL) algorithms. Motivated by modularity and algorithmic flexibility, Watts atomizes the components of OEL systems to promote the study of and direct comparisons between approaches. Examining implementations of three OEL algorithms, the paper introduces the modules of the framework. The hope is for Watts to enable benchmarking and to explore new types of OEL algorithms. The repo is available at \url{https://github.com/aadharna/watts}  ( 2 min )
    R-MBO: A Multi-surrogate Approach for Preference Incorporation in Multi-objective Bayesian Optimisation. (arXiv:2204.13166v1 [stat.ML])
    Many real-world multi-objective optimisation problems rely on computationally expensive function evaluations. Multi-objective Bayesian optimisation (BO) can be used to alleviate the computation time to find an approximated set of Pareto optimal solutions. In many real-world problems, a decision-maker has some preferences on the objective functions. One approach to incorporate the preferences in multi-objective BO is to use a scalarising function and build a single surrogate model (mono-surrogate approach) on it. This approach has two major limitations. Firstly, the fitness landscape of the scalarising function and the objective functions may not be similar. Secondly, the approach assumes that the scalarising function distribution is Gaussian, and thus a closed-form expression of an acquisition function e.g., expected improvement can be used. We overcome these limitations by building independent surrogate models (multi-surrogate approach) on each objective function and show that the distribution of the scalarising function is not Gaussian. We approximate the distribution using Generalised value distribution. We present an a-priori multi-surrogate approach to incorporate the desirable objective function values (or reference point) as the preferences of a decision-maker in multi-objective BO. The results and comparison with the existing mono-surrogate approach on benchmark and real-world optimisation problems show the potential of the proposed approach.  ( 2 min )
  • Open

    Exchangeability-Aware Sum-Product Networks. (arXiv:2110.05165v2 [cs.LG] UPDATED)
    Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.  ( 2 min )
    Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy. (arXiv:2101.07564v2 [stat.ML] UPDATED)
    We analyse the performance of several iterative algorithms for the quantisation of a probability measure $\mu$, based on the minimisation of a Maximum Mean Discrepancy (MMD). Our analysis includes kernel herding, greedy MMD minimisation and Sequential Bayesian Quadrature (SBQ). We show that the finite-sample-size approximation error, measured by the MMD, decreases as $1/n$ for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence. The upper bound on the approximation error is slightly better for SBQ, but the other methods are significantly faster, with a computational cost that increases only linearly with the number of points selected. This is illustrated by two numerical examples, with the target measure $\mu$ being uniform (a space-filling design application) and with $\mu$ a Gaussian mixture. They suggest that the bounds derived in the paper are overly pessimistic, in particular for SBQ. The sources of this pessimism are identified but seem difficult to counter.  ( 2 min )
    Tracking Most Significant Arm Switches in Bandits. (arXiv:2112.13838v5 [cs.LG] UPDATED)
    In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually severe. To resolve this question, we propose a new notion of significant shift, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of total variation $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.  ( 2 min )
    Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models. (arXiv:2108.12657v2 [cs.LG] UPDATED)
    Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.  ( 2 min )
    Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces. (arXiv:2202.10613v3 [stat.ML] UPDATED)
    Bayesian learning using Gaussian processes provides a foundational framework for making decisions in a manner that balances what is known with what could be learned by gathering data. In this dissertation, we develop techniques for broadening the applicability of Gaussian processes. This is done in two ways. Firstly, we develop pathwise conditioning techniques for Gaussian processes, which allow one to express posterior random functions as prior random functions plus a dependent update term. We introduce a wide class of efficient approximations built from this viewpoint, which can be randomly sampled once in advance, and evaluated at arbitrary locations without any subsequent stochasticity. This key property improves efficiency and makes it simpler to deploy Gaussian process models in decision-making settings. Secondly, we develop a collection of Gaussian process models over non-Euclidean spaces, including Riemannian manifolds and graphs. We derive fully constructive expressions for the covariance kernels of scalar-valued Gaussian processes on Riemannian manifolds and graphs. Building on these ideas, we describe a formalism for defining vector-valued Gaussian processes on Riemannian manifolds. The introduced techniques allow all of these models to be trained using standard computational methods. In total, these contributions make Gaussian processes easier to work with and allow them to be used within a wider class of domains in an effective and principled manner. This, in turn, makes it possible to potentially apply Gaussian processes to novel decision-making settings.  ( 2 min )
    Partitioned Variational Inference: A Framework for Probabilistic Federated Learning. (arXiv:2202.12275v4 [stat.ML] UPDATED)
    The proliferation of computing devices has brought about an opportunity to deploy machine learning models on new problem domains using previously inaccessible data. Traditional algorithms for training such models often require data to be stored on a single machine with compute performed by a single node, making them unsuitable for decentralised training on multiple devices. This deficiency has motivated the development of federated learning algorithms, which allow multiple data owners to train collaboratively and use a shared model whilst keeping local data private. However, many of these algorithms focus on obtaining point estimates of model parameters, rather than probabilistic estimates capable of capturing model uncertainty, which is essential in many applications. Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. In this paper we introduce partitioned variational inference (PVI), a general framework for performing VI in the federated setting. We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners; use PVI to unify a wealth of fragmented, yet related literature; and provide empirical results that showcase the effectiveness of PVI in a variety of federated settings.  ( 2 min )
    Normalizing flows for atomic solids. (arXiv:2111.08696v2 [physics.comp-ph] UPDATED)
    We present a machine-learning approach, based on normalizing flows, for modelling atomic solids. Our model transforms an analytically tractable base distribution into the target solid without requiring ground-truth samples for training. We report Helmholtz free energy estimates for cubic and hexagonal ice modelled as monatomic water as well as for a truncated and shifted Lennard-Jones system, and find them to be in excellent agreement with literature values and with estimates from established baseline methods. We further investigate structural properties and show that the model samples are nearly indistinguishable from the ones obtained with molecular dynamics. Our results thus demonstrate that normalizing flows can provide high-quality samples and free energy estimates without the need for multi-staging.  ( 2 min )
    Forecasting Brain Activity Based on Models of Spatio-Temporal Brain Dynamics: A Comparison of Graph Neural Network Architectures. (arXiv:2112.04266v2 [q-bio.NC] UPDATED)
    Comprehending the interplay between spatial and temporal characteristics of neural dynamics can contribute to our understanding of information processing in the human brain. Graph neural networks (GNNs) provide a new possibility to interpret graph structured signals like those observed in complex brain networks. In our study we compare different spatio-temporal GNN architectures and study their ability to model neural activity distributions obtained in functional MRI (fMRI) studies. We evaluate the performance of the GNN models on a variety of scenarios in MRI studies and also compare it to a VAR model, which is currently often used for directed functional connectivity analysis. We show that by learning localized functional interactions on the anatomical substrate, GNN based approaches are able to robustly scale to large network studies, even when available data are scarce. By including anatomical connectivity as the physical substrate for information propagation, such GNNs also provide a multi-modal perspective on directed connectivity analysis, offering a novel possibility to investigate the spatio-temporal dynamics in brain networks.  ( 2 min )
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v4 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithm works with any underlying model and (unknown) data-generating distribution and does not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision and tabular medical data.  ( 2 min )
    Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems. (arXiv:2007.03278v3 [cs.LG] UPDATED)
    Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks. In this regard, democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. The outlined principles are meant to study a generalization in distributed learning systems that goes beyond existing mechanisms such as federated learning. Moreover, such learning systems rely on hierarchical self-organization of well-connected distributed learning agents who have limited and highly personalized data and can evolve and regulate themselves based on the underlying duality of specialized and generalized processes. Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper. The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning mechanism. Subsequently, hierarchical generalized learning problems in recursive forms are formulated and shown to be approximately solved using the solutions of distributed personalized learning problems and hierarchical update mechanisms. To that end, a distributed learning algorithm, namely DemLearn is proposed. Extensive experiments on benchmark MNIST, Fashion-MNIST, FE-MNIST, and CIFAR-10 datasets show that the proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms. The detailed analysis provides useful observations to further handle both the generalization and specialization performance of the learning models in Dem-AI systems.  ( 2 min )
    Adversarial Meta-Learning of Gamma-Minimax Estimators That Leverage Prior Knowledge. (arXiv:2012.05465v2 [stat.ME] UPDATED)
    Bayes estimators are well known to provide a means to incorporate prior knowledge that can be expressed in terms of a single prior distribution. However, when this knowledge is too vague to express with a single prior, an alternative approach is needed. Gamma-minimax estimators provide such an approach. These estimators minimize the worst-case Bayes risk over a set $\Gamma$ of prior distributions that are compatible with the available knowledge. Traditionally, Gamma-minimaxity is defined for parametric models. In this work, we define Gamma-minimax estimators for general models and propose adversarial meta-learning algorithms to compute them when the set of prior distributions is constrained by generalized moments. Accompanying convergence guarantees are also provided. We also introduce a neural network class that provides a rich, but finite-dimensional, class of estimators from which a Gamma-minimax estimator can be selected. We illustrate our method in two settings, namely entropy estimation and a prediction problem that arises in biodiversity studies.  ( 2 min )
    Signal Recovery with Non-Expansive Generative Network Priors. (arXiv:2204.13599v1 [eess.SP])
    We study compressive sensing with a deep generative network prior. Initial theoretical guarantees for efficient recovery from compressed linear measurements have been developed for signals in the range of a ReLU network with Gaussian weights and logarithmic expansivity: that is when each layer is larger than the previous one by a logarithmic factor. It was later shown that constant expansivity is sufficient for recovery. It has remained open whether the expansivity can be relaxed allowing for networks with contractive layers, as often the case of real generators. In this work we answer this question, proving that a signal in the range of a Gaussian generative network can be recovered from a few linear measurements provided that the width of the layers is proportional to the input layer size (up to log factors). This condition allows the generative network to have contractive layers. Our result is based on showing that Gaussian matrices satisfy a matrix concentration inequality, which we term Range Restricted Weight Distribution Condition (R2WDC), and weakens the Weight Distribution Condition (WDC) upon which previous theoretical guarantees were based on. The WDC has also been used to analyze other signal recovery problems with generative network priors. By replacing the WDC with the R2WDC, we are able to extend previous results for signal recovery with expansive generative network priors to non-expansive ones. We discuss these extensions for phase retrieval, denoising, and spiked matrix recovery.  ( 2 min )
    Multiplicative Updates for NMF with $\beta$-Divergences under Disjoint Equality Constraints. (arXiv:2010.16223v2 [cs.LG] UPDATED)
    Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $\beta$-divergences ($\beta$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$\beta$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $\beta$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $\beta$-NMF with $\ell_2$-norm constraints on the columns of $W$.  ( 2 min )
    Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching. (arXiv:2204.13453v1 [cs.CV])
    State-of-the-art fully intrinsic networks for non-rigid shape matching often struggle to disambiguate the symmetries of the shapes leading to unstable correspondence predictions. Meanwhile, recent advances in the functional map framework allow to enforce orientation preservation using a functional representation for tangent vector field transfer, through so-called complex functional maps. Using this representation, we propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting. Our architecture is built on top of DiffusionNet, making it robust to discretization changes. Additionally, we introduce a vector field-based loss, which promotes orientation preservation without using (often unstable) extrinsic descriptors.  ( 2 min )
    Predicting single-cell perturbation responses for unseen drugs. (arXiv:2204.13545v1 [cs.LG])
    Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.  ( 2 min )
    ELM: Embedding and Logit Margins for Long-Tail Learning. (arXiv:2204.13208v1 [cs.LG])
    Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural models, such techniques do not explicitly control the geometry of the learned embeddings. This can be potentially sub-optimal, since embeddings for tail classes may be diffuse, resulting in poor generalization for these classes. We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space, and regularize the distribution of embeddings. This connects losses for long-tail learning to proposals in the literature on metric embedding, and contrastive learning. We theoretically show that minimising the proposed ELM objective helps reduce the generalisation gap. The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.  ( 2 min )
    Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension. (arXiv:1905.10395v5 [cs.LG] UPDATED)
    We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines. This work is a gentle extension of [2].  ( 3 min )
    Overcoming Catastrophic Forgetting via Direction-Constrained Optimization. (arXiv:2011.12581v2 [cs.LG] UPDATED)
    This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.  ( 2 min )
    R-MBO: A Multi-surrogate Approach for Preference Incorporation in Multi-objective Bayesian Optimisation. (arXiv:2204.13166v1 [stat.ML])
    Many real-world multi-objective optimisation problems rely on computationally expensive function evaluations. Multi-objective Bayesian optimisation (BO) can be used to alleviate the computation time to find an approximated set of Pareto optimal solutions. In many real-world problems, a decision-maker has some preferences on the objective functions. One approach to incorporate the preferences in multi-objective BO is to use a scalarising function and build a single surrogate model (mono-surrogate approach) on it. This approach has two major limitations. Firstly, the fitness landscape of the scalarising function and the objective functions may not be similar. Secondly, the approach assumes that the scalarising function distribution is Gaussian, and thus a closed-form expression of an acquisition function e.g., expected improvement can be used. We overcome these limitations by building independent surrogate models (multi-surrogate approach) on each objective function and show that the distribution of the scalarising function is not Gaussian. We approximate the distribution using Generalised value distribution. We present an a-priori multi-surrogate approach to incorporate the desirable objective function values (or reference point) as the preferences of a decision-maker in multi-objective BO. The results and comparison with the existing mono-surrogate approach on benchmark and real-world optimisation problems show the potential of the proposed approach.  ( 2 min )
    Unlocking High-Accuracy Differentially Private Image Classification through Scale. (arXiv:2204.13650v1 [cs.LG])
    Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained 200-layer Normalizer-Free ResNet, we achieve a remarkable 77.1% top-1 accuracy on ImageNet under (1, 8*10^{-7})-DP, and achieve 81.1% under (8, 8*10^{-7})-DP. This markedly exceeds the previous SOTA of 47.9% under a larger privacy budget of (10, 10^{-6})-DP. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.  ( 2 min )
    AlphaZero-Inspired General Board Game Learning and Playing. (arXiv:2204.13307v1 [cs.LG])
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they are very complex and require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS for the first time around RL n-tuple networks to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this AlphaZero-inspired agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other algorithms could only defeat Edax up to level 2).  ( 2 min )
    Standardized Evaluation of Machine Learning Methods for Evolving Data Streams. (arXiv:2204.13625v1 [cs.LG])
    Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work therefore often draws on different heuristics and simulations that do not necessarily produce meaningful and reliable results. Indeed, in the absence of common evaluation standards, it often remains unclear how online learning methods will perform in practice or in comparison to similar work. In this paper, we propose a comprehensive set of properties for high-quality machine learning in evolving data streams. In particular, we discuss sensible performance measures and evaluation strategies for online predictive modelling, online feature selection and concept drift detection. As one of the first works, we also look at the interpretability of online learning methods. The proposed evaluation standards are provided in a new Python framework called float. Float is completely modular and allows the simultaneous integration of common libraries, such as scikit-multiflow or river, with custom code. Float is open-sourced and can be accessed at https://github.com/haugjo/float. In this sense, we hope that our work will contribute to more standardized, reliable and realistic testing and comparison of online machine learning methods.  ( 2 min )
    Quality Inference in Federated Learning with Secure Aggregation. (arXiv:2007.06236v3 [cs.LG] UPDATED)
    Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to detect misbehaviours, to stabilize training performance, and to measure the individual contributions of participants.  ( 2 min )
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v1 [cs.LG])
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.  ( 2 min )
    A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v4 [stat.ML] UPDATED)
    Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.  ( 2 min )
    Semantic Communication: An Information Bottleneck View. (arXiv:2204.13366v1 [cs.IT])
    Motivated by recent success of machine learning tools at the PHY layer and driven by high bandwidth demands of the next wireless communication standard 6G, the old idea of semantic communication by Weaver from 1949 has received considerable attention. It breaks with the classic design paradigm according to Shannon by aiming to transmit the meaning of a message rather than its exact copy and thus potentially allows for savings in bandwidth. In this work, inspired by Weaver, we propose an information-theoretic framework where the semantic context is explicitly introduced into probabilistic models. In particular, for bandwidth efficient transmission, we define semantic communication system design as an Information Bottleneck optimization problem and consider important implementation aspects. Further, we uncover the restrictions of the classic 5G communication system design w.r.t. semantic context. Notably, based on the example of distributed image classification, we reveal the huge potential of a semantic communication system design. Numerical results show a tremendous saving in bandwidth of 20 dB with our proposed approach ISCNet compared to a classic PHY layer design.  ( 2 min )
    On the Use of Dimension Reduction or Signal Separation Methods for Nitrogen River Pollution Source Identification. (arXiv:2204.13182v1 [stat.AP])
    Identification of the current and expected future pollution sources to rivers is crucial for sound environmental management. For this purpose numerous approaches were proposed that can be clustered under physical based models, stable isotope analysis and mixing methods, mass balance methods, time series analysis, land cover analysis, and spatial statistics. Another extremely common method is Principal Component Analysis, as well as its modifications, such as Absolute Principal Component Score. they have been applied to the source identification problems for nitrogen entry to rivers. This manuscript is checking whether PCA can really be a powerful method to uncover nitrogen pollution sources considering its theoretical background and assumptions. Moreover, slightly similar techniques, Independent Component Analysis and Factor Analysis will also be considered.  ( 2 min )
    On the Normalizing Constant of the Continuous Categorical Distribution. (arXiv:2204.13290v1 [stat.ML])
    Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution. Our code is available at https://github.com/cunningham-lab/cb_and_cc/.  ( 2 min )
    Asymptotic Inference for Infinitely Imbalanced Logistic Regression. (arXiv:2204.13231v1 [math.ST])
    In this paper we extend the work of Owen (2007) by deriving a second order expansion for the slope parameter in logistic regression, when the size of the majority class is unbounded and the minority class is finite. More precisely, we demonstrate that the second order term converges to a normal distribution and explicitly compute its variance, which surprisingly once again depends only on the mean of the minority class points and not their arrangement under mild regularity assumptions. In the case that the majority class is normally distributed, we illustrate that the variance of the the limiting slope depends exponentially on the z-score of the average of the minority class's points with respect to the majority class's distribution. We confirm our results by Monte Carlo simulations.  ( 2 min )

  • Open

    Should I become an Art therapist
    submitted by /u/BalanceSubstantial66 [link] [comments]
    Is it easier to mimic a model based on its input/output or to train an original model in the first place ?
    An original model is trained with a data-set, typically labeled by humans (say for classifiers). However, what if one would like to copycat a closed model only exposed through an API ? By doing this, the data-set instead would be the input / output of the original model. ​ Is it easier to train the copycat model or the original one ? How much data would be required to train the copycat versus the original one ? Are there practical examples or this happening ? Can it possibly worth it and if so under which circumstances ? How much data would be required for example for the notorious Dall-e 2? submitted by /u/Wishmaster04 [link] [comments]  ( 1 min )
    Disney princesses according to AI. Is this done manually or through an AI app?
    submitted by /u/p0goniphaft111 [link] [comments]
    Specification gaming: the flip side of AI ingenuity
    submitted by /u/estasfuera [link] [comments]
    Beyond interpretability: developing a language to shape our relationships with AI
    submitted by /u/estasfuera [link] [comments]
    Guide to Iteratively Tuning GNN's
    submitted by /u/aidev2040 [link] [comments]
    Last Week in AI: AI Driving Instructors, MASSIVE Speech Dataset, AI ’Show Stealers’, Tesla Jet Crash
    submitted by /u/regalalgorithm [link] [comments]
    Best way to format dialogue to fine tune GPT-J, 3 ...
    Hi all, a quick questions, given that online I'm not finding that many info: how would you format a dialogue between people to fine tune a GPT model (be it GPT-J, GPT-3 etc.)? For example, if I want the GPT model to create a new dialogue from the show "Friends" (choosed this because all dialogues are available online https://www.kaggle.com/datasets/blessondensil294/friends-tv-series-screenplay-script?resource=download ) how should I format the input? ​ [Scene: Central Perk, Chandler, Joey, Phoebe, and Monica are there.] Monica: There's nothing to tell! He's just some guy I work with! Joey: C'mon, you're going out with the guy! There's gotta be something wrong with him! Chandler: All right Joey, be nice. So does he have a hump? A hump and a hairpiece? Phoebe: Wait, does he eat chalk? (They all stare, bemused.) Phoebe: Just, 'cause, I don't want her to go through what I went through with Carl- oh! Monica: Okay, everybody relax. This is not even a date. It's just two people going out to dinner and- not having sex. Chandler: Sounds like a date to me. submitted by /u/Sgnarf1989 [link] [comments]  ( 1 min )
    Meta's AI team searches for the secret trick of human intelligence
    submitted by /u/much_successes [link] [comments]
    I am new to Artificial Intelligence and I need your worthy suggestions about AI.
    Hey guys, I am new to AI. Please suggest some good and new topics/technologies that are supported by AI which will be very informative and beneficial for me in my career. Every suggestion is a high priority for me. Waiting for your replies. Thanks submitted by /u/adilonreddit1 [link] [comments]  ( 1 min )
    AI Augmentation in this Assemblage 23 music video
    submitted by /u/LightOfAntara [link] [comments]
    How Human Pose Estimation Technology Can Be Used in 2022?
    Hi everyone, I want to share with you an article that I worked on with my colleague about the use cases of human pose estimation. I would love it if you could check it out and share your ideas for using this technology in the comments below. https://mobidev.biz/blog/human-pose-estimation-technology-guide submitted by /u/Data-Power [link] [comments]
    Deepmind Researchers Propose Fair Normalizing Flows (FNF): A Rigorous Approach For Learning Fair Representations
    ​ https://preview.redd.it/2hryjv40cgw81.png?width=1024&format=png&auto=webp&s=30f13567b83b3cdfe5558d09f188b2e69da7e73f Fair representation learning has emerged as one of the most promising techniques to encode data into new, impartial representations with high utility as machine learning is increasingly utilized in settings that potentially harm humans. Fair representation means presenting data without regard to gender, color, or other factors. Due to human-introduced bias, these biases are found in the word vector representations in language models. The goal of Learning Fair Representation is to reduce bias by decreasing the semantic distance between biassed terms. The goal of fair representation learning is to guarantee that representations are useful for a variety of prediction tasks and that sensitive aspects of the original data cannot be extracted from them. Adversarial training, which combines an encoder aiming to turn data into a fair representation with an adversary attempting to recover sensitive features from the representation, is the most used method for learning fair representations. However, recent research has discovered that these methods do not provide completely fair representations: stronger opponents can recover sensitive features. This could allow malicious or uneducated users to discriminate using the available representations. The issue of fair representation has lately risen in prominence as regulators draught guidelines on the ethical use of AI, indicating that any company that cannot ensure non-discrimination would be held liable for the data produced. Continue Reading Paper: https://openreview.net/pdf?id=BrFIKuxrZE Github: https://github.com/eth-sri/fnf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    It’s not a place (made with starryai)
    submitted by /u/Losthel [link] [comments]
    Night Cafe "fire on the nuclear plant in the amazon by Simon Stalenhag"
    submitted by /u/brunovianna [link] [comments]
    Ladybugs (A.I. animation + sound design)
    submitted by /u/nenomancer [link] [comments]  ( 1 min )
    Adaptive Multi-Strategy Market-Making Agent For Volatile Markets
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    Artificial Nightmares: Hall Monitor || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    Playing Chess With Offline Reinforcement Learning
    submitted by /u/Bellerb [link] [comments]  ( 1 min )
    How to use RL for combinatorial optimization
    I am trying to use reinforcement for combinatorial optimization. I coded up a toy example for the travelling salesman but I am confused by something. In stable baselines 3 there is a variable "done". If I set it to optimum value (that is the length of the shortest route through all the cities computer by a different method) the RL algorithm never finds it. What should done be set to? Or more generally how do you do optimization of this sort using RL. I can just set it so that it is done after 1000 steps but then it is spending a lot of time finding the best way to take the first 1000 steps which isn't quite the point. submitted by /u/wiggyhat [link] [comments]  ( 2 min )
    Can agents act simultaneously with no notion of turn-taking?
    I was reading this paper and in section 3 they claim that agents act simultaneously and there is no notion of turn-taking: https://arxiv.org/abs/2104.07750 I was wondering how this works. What I'm used to seeing is a for loop in which, one after the other, agents execute the step function and interact with the environment. How does this change if all agents act simultaneously? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    How to mitigate catastrophic forgetting for an intelligent agent using replay memory and Deep Reinforcement Learning algorithm
    Hello everyone. I built a use case to instruct an agent, which simulates a herd of predators, to encircle a prey. Episodes are non-terminal and have a fixed number of steps. The agent uses a neural network to decide which actions to take at each step and is instructed by a remember/replay mechanism using a memory of past events. I started the experiment using absolute coordinates in the state returned by the environment and the result is displayed in the following image, where the total reward values at the end of each episode ​​are displayed. https://preview.redd.it/oscet38c0gw81.png?width=640&format=png&auto=webp&s=092fc92ee56c919bdc1851bb9f1ef9bdc585bfde In this case the results were encouraging. However, I wanted to create a more generic use case, using vectors that represent the position of the prey with respect to predators instead of absolute coordinates, in order to make the task more generic. Unfortunately, the best result I was able to obtain is the one shown in the image below where, apparently, I have a deterioration after a certain number of episodes. ​ https://preview.redd.it/yoiygwxe0gw81.png?width=640&format=png&auto=webp&s=06b7082cc0ebb80e3f5e5ce8307d75cd6451f79b It seems to me a case of catastrophic forgetting (I could be wrong) and I managed to mitigate it by increasing the memory, retaining the results of the older episodes, lowering the learning rate of the neural network and using a learning decay algorithm, but I have not succeeded to eliminate the phenomenon completely. Anyone have any advice? submitted by /u/kaeldric__ [link] [comments]  ( 2 min )
    Speeding up custom implementations.
    Hi all! I've been implementing some of the model-free algorithms in recent times. Comparing their performance to open-source libraries, however, learning takes severely longer in a computational sense. I wonder what tricks libraries such as stable-baselines3 use to increase frames per seconds and accelerate policy updates. So far, I vectorized the environment sampling & the bottleneck seems to be computing the updates for the agent. Thanks a lot! submitted by /u/Internal-Brush4929 [link] [comments]  ( 1 min )
    Any blog or video series that is available to run TD3 algorithm for path planning purpose . Trained agents deployed in hardware level
    submitted by /u/ajithvallabai [link] [comments]  ( 1 min )
    Microsoft AI Researchers Introduce PPE: A Mathematically Guaranteed Reinforcement learning (RL) Algorithm For Exogenous Noise
    Reinforcement learning (RL) is a machine learning training strategy that rewards desirable behaviors while penalizing undesirable ones. A reinforcement learning agent can perceive and comprehend its surroundings, act, and learn through trial and error in general. Although RL agents can heuristically solve some problems, such as assisting a robot in navigating to a specific location in a given environment, there is no guarantee that they will be able to handle problems in settings they have not yet encountered. The capacity of these models to recognize the robot and any obstacles in its path, but not changes in its surrounding environment that occur independently of the agent, which we refer to as exogenous noise, is critical to their success. Existing RL algorithms are not powerful enoug…  ( 2 min )
  • Open

    [D] Usage Optimized AWS GPUs, 57-63% off On-Demand Prices
    www.usage.ai ​ Usage AI bundles 3-year no-upfront RIs on AWS with guaranteed buyback -- so users get all the savings of 3-year RIs with none of the commitment. I helped engineer the product. Here to answer any questions! submitted by /u/usage-team [link] [comments]
    [P] Blog post: Learning JAX by Learning to Learn
    I recently published a new blog post, which goes over how meta-learned optimizers work and how to implement them in JAX. JAX's composable function transforms make implementing meta-learning algorithms very straightforward. If you're interested in JAX or meta learning give it a read! Blog post: https://teddykoker.com/2022/04/learning-to-learn-jax/ Code: https://github.com/teddykoker/learning-to-learn-jax Original Paper (NeurIPS '16): https://arxiv.org/abs/1606.04474 submitted by /u/tomkoker [link] [comments]  ( 1 min )
    [P] Introducing FlowMeter for network packet analysis
    We’ve released a new open source project - https://github.com/deepfence/FlowMeter - to analyze and classify packet captures using ML techniques. FlowMeter is an experimental project; we’re using it to evaluate how effectively we can train an ML model to discriminate between different types of traffic flows, e.g. normal and anomalous. You can use sample data from various sources (see the README), or gather packet captures using PacketStreamer https://github.com/deepfence/PacketStreamer or other pcap tools. More information in the README, here: https://github.com/deepfence/FlowMeter and this blogpost: https://medium.com/@siddharthsatpathy.ss/introducing-flowmeter-97e0507862b6 Hope some people find it useful; we’d welcome any feedback, thank you. submitted by /u/sidd_ss [link] [comments]  ( 1 min )
    [P] open-source python library for making machine learning demos that runs in the browser or inside a jupyter notebook/google colab, package is available on PyPI
    https://gradio.app/ is for demoing machine learning models https://reddit.com/link/ueod3x/video/75e1ozmjihw81/player Prerequisite: Python 3.7+ and that's it! Quick Start To get Gradio running with a simple "Hello, World" example, follow these three steps: Install Gradio from pip. ​ pip install gradio Run the code below as a Python script or in a Python notebook (or in a colab notebook). ​ import gradio as gr def greet(name): return "Hello " + name + "!!" demo = gr.Interface(fn=greet, inputs="text", outputs="text") if __name__ == "__main__": demo.launch() The interface below will appear automatically within the Python notebook, or pop in a browser on http://localhost:7860 if running from a script. see more in the getting started guide: https://gradio.app/getting_started/ submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 2 min )
    [P] Hot Reloading for Pandas
    Hi guys I thought you might find my project useful. It's called Reloadium and saves a lot of time during python development, especially in data science and machine learning field. More details here: https://github.com/reloadware/reloadium Hot Reloading dataframes Modifying and fixing code during debugging What do you guys think about it? submitted by /u/kwazar90 [link] [comments]  ( 1 min )
    [D] SHAP method to get feature importance: is linearity realistic?
    The SHAP method assumes linear relationship between the feature effects (see definition "Additive feature attribution methods" in the paper). But is this assumption realistic? submitted by /u/savoga [link] [comments]  ( 1 min )
    [D] Need to find a good self-hosted medical image annotation tool.
    Hi. I'm trying to come up with a solution for a medical image annotation system for our laboratory. We need it to be self-hosted (in order to only work in the uni's internal network), and open source (since the funds are limited). So far I've only found out about https://lab.vindr.ai/dashboard/projects but the documentation is really bad and I could not launch it using docker compose. I've also found MONAILabel(https://github.com/Project-MONAI/MONAILabel), but it apparently requires GPU which makes it really expensive. I'd rather find a cpu based solution because our task is not that complex. We only get some Dicom files (each have studies in them), and want to label them. submitted by /u/feryet [link] [comments]  ( 1 min )
    Machine Learning in Fantasy Premier League? [D]
    https://medium.com/@subin.sen7/our-client-is-the-worlds-best-fpl-player-this-is-not-clickbait-51481fae76c9 Is it possible to use models to beat the market in Fantasy Football? submitted by /u/AI_FPL [link] [comments]
    [P] XGboost, sklearn and others running over encrypted data
    Hello everyone! Following this post numpy in fhe we are releasing a new lib that allows popular machine learning frameworks to run over encrypted data: https://github.com/zama-ai/concrete-ml Currently this supports xgboost and many sklearn models. We also support pytorch to some extent. We are trying to closely follow sklearn API (when relevant) to make the use easy to machine learning practitioners. Happy to hear any feedback on this ! submitted by /u/strojax [link] [comments]  ( 1 min )
    [D] Is Tensorflow.js still much slower than Tensorflow
    Last time I used Tensorflow was 3 years ago and it was a resource hog as well as very slow. I intend to get back into machine learning and wondering if I should give Tensorflow.js a go again using the Nodejs backend since this will be an electron app. Or should I just go straight for Tensorflow. Ideally I will need about 15 fps while taking up minimal resources submitted by /u/manrayboy [link] [comments]  ( 1 min )
    [D] Collaboration-first machine learning platform that enables you to build, train, track, and share your ML projects simply with a few lines of code
    Hey r/MachineLearning, I'm Derrick from Layer (layer.ai) - the collaboration-first machine learning platform that enables you to build, train, track, and share your ML projects simply with a few lines of code. We are soft-launching today! I’ve been working on Layer for the past 2 years with an awesome team around the world. We really poured our hearts and minds into Layer and hope you will like it. Your feedback would be very appreciated! Layer Demo To get started, you can simply run our Quickstart Example! How is Layer different from other tools? Although there are plenty of ML and DS tooling products, we believe that there is still a large gap around collaboration. Many data science projects are hosted on GitHub, which, in our experience, does not provide sufficient depth and abs…  ( 4 min )
  • Open

    How to Tune Graph Neural Networks
    submitted by /u/aidev2040 [link] [comments]
  • Open

    A one-up on motion capture
    A new neural network approach captures the characteristics of a physical system’s dynamic motion from video, regardless of rendering configuration or image differences.  ( 7 min )
    Engineers use artificial intelligence to capture the complexity of breaking waves
    Their model’s predictions should help researchers improve ocean climate simulations and hone the design of offshore structures.  ( 6 min )
  • Open

    Extracting Skill-Centric State Abstractions from Value Functions
    Posted by Dhruv Shah, Intern, and Brian Ichter, Research Scientist, Robotics at Google Advances in reinforcement learning (RL) for robotics have enabled robotic agents to perform increasingly complex tasks in challenging environments. Recent results show that robots can learn to fold clothes, dexterously manipulate a rubik’s cube, sort objects by color, navigate complex environments and walk on difficult, uneven terrain. But "short-horizon" tasks such as these, which require very little long-term planning and provide immediate failure feedback, are relatively easy to train compared to many tasks that may confront a robot in a real-world setting. Unfortunately, scaling such short-horizon skills to the abstract, long horizons of real-world tasks is difficult. For example, how would one trai…  ( 7 min )
  • Open

    Techniques to Write Better Python Code
    We write a program to solve a problem or make a tool that we can repeatedly solve a similar problem. For the latter, it is inevitable that we come back to revisit the program we wrote, or someone else is reusing the program we write. There is also a chance that we will encounter data […] The post Techniques to Write Better Python Code appeared first on Machine Learning Mastery.  ( 15 min )
  • Open

    How to get AI to confuse a shark with a clam
    "The Megalodon was a large bivalve, measuring up to 2.5 meters in length. Its shell was covered in spines, and it had a large, powerful jaw for crushing prey." Although the megalodon is the most widely known as a giant prehistoric shark, I recently learned that Megalodon  ( 4 min )
    Bonus: GPT-3 answers my questions about sawflies, badly
    AI Weirdness: the strange side of machine learning  ( 1 min )
  • Open

    ‘I Doubt, Therefore I Am,’ Said AI
    It’s Monday morning, and Paul opens one of several emails sent by his boss, Heather. Her email seems a bit unusual, asking Paul to rush to… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 4 min )
  • Open

    Designing Societally Beneficial Reinforcement Learning Systems
    Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind’s work on controlling a nuclear reactor or on improving Youtube video compression, or Tesla attempting to use a method inspired by MuZero for autonomous vehicle behavior planning. But the exciting potential for real world applications of RL should also come with a healthy dose of caution - for example RL policies are well known to be vulnerable to exploitation, and methods for safe and robust policy development are an active area of research. At the same time as the emergence of powerful RL systems in the real world, the public and researchers are expressing an increased appetite for fair, aligned, and safe machine …  ( 8 min )

  • Open

    Decision Transformers with Hugging Face
    submitted by /u/hellopaperspace [link] [comments]
    How do you get a global observation?
    Naive question: how do you pass a global observation of the environment to your actor and critic? In other words, is a global observation always available or does it depend on the environment? Can you give me a couple of examples? Thanks :) submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Why do Monte Carlo methods have low bias compared to TD methods in RL. Does the bias term in RL has same meaning as in general in ML terminology?
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 1 min )
    2nd Neural MMO challenge is out! Design your policy to master PvE and PvP challenge.
    submitted by /u/xiaolongzhu [link] [comments]  ( 1 min )
    Parallel environments affect Q learning performance
    Parallel environments can make the learning much more efficient, but sometimes using parallel environments seemed affect the performance. Does anyone know why and how to solve? submitted by /u/Traditional-Ad4492 [link] [comments]  ( 1 min )
    Papers for Green Vehicle Routing Problem
    My current work requires me to work with GVRP. The specific problem is for an EV (agent) to find the optimal route to deliver items based on charge consumption and not discharging. I'm currently modelling it as a graph datastructure although it could be different. Are there any papers (both RL and exact method based) for state of the art that I can look into regarding this. I'm also interested in earlier papers that can help me build understanding, and I can increase complexity slowly. Thanks for the help! submitted by /u/evilBotman [link] [comments]  ( 1 min )
    What is the current SOTA for single-threaded continuous-action control using RL?
    As above. I am interested in RL for robotics, specifically for legged locomotion. I wish to explore RL training on the real robot. Sample efficiency is paramount. Has any progress been made by utilizing, say, RNNs/LSTMs or even Attention ? submitted by /u/pakodanomics [link] [comments]  ( 1 min )
    Is it possible/useful to allow the agent to influence the observations in later timesteps?
    In my current environment, the agent is fed observations from a particular mathematical computation based upon the underlying state. This mathematical computation has hyper-parameters that influence the resulting observation given to the agent. Does it make any sense whatsoever to give these hyper-parameters to the agent within the action space? In this way the agent would be able to adjust its future observations and perhaps it would be able to use this in a smart way. On the other hand, I've heard that changing the environment from under the agents feed during training can lead to major issues with training. One can imagine that learning a dynamic environment is harder than a static one. Any thoughts? submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    treequeues: transfert jax pytrees between processes with very high speed!
    Hello! If you are using jax and you need to pass some pytrees between processes, I may have something for you :) I developed a "treequeue". It is a queue that is made for pytree's nested arrays. The transfer speed is up to 10 times higher than regular queues. This is done by utilizing shared memory arrays and avoiding pickling data. This can be very useful when developing distributed architecture, e.g. distributed reinforcement learning where speed is at the upmost importance. In my case this implementation was very useful to remove bottlenecks when implementing RL PBT algorithms! https://github.com/thomashirtz/treequeues Cheers! submitted by /u/krenast [link] [comments]  ( 1 min )
    Could you recommend a paper on self-supervised RL related to catastrophic forgetting?
    Hi, I have problems the agent forgets good states during self-supervised pre-training. It shows good exploration, but as time goes by, only the edge case is explored, showing poor performance at finetune. I found a paper about that before, but It is really hard to find again. It related to reward. Could you recommend a paper related to this? Thanks for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    [D] Any independent researchers ever get published/into conferences?
    For personal context: I worked over time during my bachelors and took grad classes/started research, but financial situation took a hit for the worse and we'll let's say I don't have enough money for a Masters/continue my studies. I'm job hunting now but in a chicken/egg problem most of these jobs want at the very least research (which I know how to conduct) for which they have the resources for. Either way I have a couple research topics I want to explore, but limited resources and want to be realistic here. main question: do you know of anyone who has done it as a non-grad/non-institutional way without the help of established figures in the field. I only know of those that have done it in the way I describe above (either under an established researcher's team, a university, or a company)? Add-on: would appreciate any personal experiences as well, and if anyone has experience with conferences/how long it takes, etc. My research experience has been largely under NDAs so I haven't experienced formal publishing yet (but want to!) submitted by /u/robml [link] [comments]  ( 2 min )
    [D] feature embeddings extraction for image classification
    Is there any rule of thumb to decide from which layer extract feature embeddings for classification tasks based on knn? Does it depend by the architecture ?(conv net , vs vit models, ecc) Should I extract before or after activation function? Edit: Which model do you suggest to start from ? Actually i'm using a resnet50 trained with DINO submitted by /u/Rich_Freedom98 [link] [comments]  ( 1 min )
    [P] Terraform Provider Iterative (TPI) - plugin for ML/AI workloads to spot instance recovery & auto-termination on AWS, GCP, Azure, Kubernetes
    Terraform Provider Iterative (TPI) address the specific needs of machine learning teams - it is an open-source tool extending the functionality of Terraform, the world's most widely used multi-cloud provisioning product. The tool enables full lifecycle management of computing resources and is designed specifically for machine learning pipelines: Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes The tool aims to bridge the gap between devops and data science teams and build on top of Terraform, a tool universally familiar to devops teams, but extend it to suit machine learning needs. It provides to following advantages for your ML workflow: Lower cost: use your preferred cloud provider's existing pricing, including on-demand per-second billing and bulk discounts. Auto-recovery: spot/preemptible instances are cheap but unreliable. TPI reliably and automatically respawns such interrupted instances, caching & restoring the working directory in the cloud even when you are offline. Custom spec: full control over hardware & software requirements via a single config file. submitted by /u/cmstrump [link] [comments]  ( 1 min )
    [R] Flamingo: a Visual Language Model for Few-Shot Learning (from DeepMind)
    Paper (pdf). A link to the paper is also in this blog post. Abstract: Building models that can be rapidly adapted to numerous tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. Flamingo models include key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of the proposed Flamingo models, exploring and measuring their ability to rapidly adapt to a variety of image and video understanding benchmarks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer, captioning tasks, which evaluate the ability to describe a scene or an event, and close-ended tasks such as multiple choice visual question-answering. For tasks lying anywhere on this spectrum, we demonstrate that a single Flamingo model can achieve a new state of the art for few-shot learning, simply by prompting the model with task-specific examples. On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data. submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    [D] Self-Organizing Maps and Principal Component Analysis.
    Hello all, I am doing my PhD on numerical modelling of solar radiation and I am researching SOMs as a possible tool. I have this problem where I can't decide on a network size, so I have been trying to develop some means to compare SOM neurons from different sized networks. I have read that the 2-d SOM array tends to spam the 2 highest variance principal components. So would it be adecuate to tag my SOM nodes with the first two proyections of PCA? So that I can compare results from different experiments. submitted by /u/juliancanellas [link] [comments]  ( 1 min )
    "[Discussion]", Same MAE Result
    Hello guys, I'm wondering if you can help: we use machine learning models to identify trading opportunities using historical trading data of price and market indicators. The data we use is fine: abundant, accurate and (manually) proven to work over several non-sequential months in the past. The problem is that we keep on getting the exact same MAE result. Any suggestions on solving this issue would be immensely appreciated. submitted by /u/AFAC1410 [link] [comments]  ( 1 min )
    [P] New blog post: how to automatically find label errors in your audio datasets!
    Hi folks, our blog post on how to automatically find label errors in audio datasets has just gone live. We cover the steps to: ⛏️ Perform feature extraction (aka embeddings) on the Spoken Digit dataset with a pre-trained PyTorch model. 🔢 Use cross-validation to generate out-of-sample predicted probabilities for every example in the dataset. 🏷️ Run one line of cleanlab code on these predicted probabilities to identify which audio clips may be mislabeled. 📰 Blog Post + Google Colab: https://cleanlab.ai/blog/label-errors-audio-datasets/ https://preview.redd.it/vpzhwg6jebw81.png?width=1260&format=png&auto=webp&s=3fa79d8a097ae5936e5e6a51e87508adf6190835 submitted by /u/weijinglok [link] [comments]  ( 1 min )
    Shazam App for your brain? [R]
    https://arxiv.org/abs/2202.03265?context=eess Computer vision approach to processing naturalistic brain responses to music. Models achieved state of the art and they used publicly available datasets. With 1 sec of brain signal they can classify the name of the song you're listening to and how much you enjoyed it ~89%. Imagine this but on your airpods, seems wild. submitted by /u/blackliquerish [link] [comments]
    [D] What roles were you able to receive as a PhD with no top tier research?
    It’s looking more and more like I won’t be able to get any top tier publications in my PhD (NeurIPS, ICLR, etc.). I’ve tried, but various factors have limited the experience. I’m curious what people were able to do in this circumstance. I’ll have 3-5 publications in mid-tier venues, plus many years of experience as a software engineer and machine learning engineer (in non-tech companies). People in my program have gone on to work at Amazon, and a few Google, but these were all applied scientist and engineering roles. Does anyone go on to work as a research scientist? National labs seem like a good bet, I’m curious what others have done. submitted by /u/walterkronkite33 [link] [comments]  ( 5 min )
    [D] Have we stopped researching agents?
    It seems to me that the ML research community has stopped talking about agents? As in, machines that act in complex environments by themselves? Ever since GPT-3 came out, bascially every single paper has been related to NLP or Transformers in general. It seems a natural next step would be to try to get agents we could tell what to do and how to do it in natural language, no? Yet that has not been the focus at all. The last time we had wide spread focus on planning and acting was the Starcraft challenge, but that was "solved" so quickly nothing much came of it. Research I know about: - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents - Tesla in general I guess, with their cars and maybe robots Is it just me or is there just no interest in researching more capable agents right now? submitted by /u/ReasonablyBadass [link] [comments]  ( 2 min )
    [D] Advertisement Allocation models being used?
    There are a bunch of papers on different advertisement allocation techniques (placing ads to valid ad spots) and in algorithms in grad school I learned MaxFlow/MinCut, but I'm wondering what is actually being used in production systems? Linear models, genetic algorithms, reinforcement learning, some neural network? I know that very little of this is published/talked about due to being proprietary. However, I'm wondering because so much gets published, but the more advanced algorithms tend to be complex and not necessarily possible to be deployed in production systems (what I suspect)? submitted by /u/Flipper3 [link] [comments]  ( 2 min )
    [D] ML model dev pipelines
    Hi Folks! It'd really help if you can participate in this poll and share about your ML/DL model workflow before moving it to production. Out of these steps, which one is your most preferred way of integrating ML into your app or business (not specific to any domain): View Poll submitted by /u/fgp121 [link] [comments]  ( 1 min )
    [R] RL papers relating to Green Vehicle Routing Problem
    My current work requires me to work with GVRP. The specific problem is for an EV (agent) to find the optimal route to deliver items based on charge consumption and not discharging. I'm currently modelling it as a graph datastructure although it could be different. Are there any papers (both RL and exact method based) for state of the art that I can look into regarding this. I'm also interested in earlier papers that can help me build understanding, and I can increase complexity slowly. Thanks for the help! submitted by /u/evilBotman [link] [comments]  ( 1 min )
    [D] What is the recommended way to estimate norm for gradient clipping?
    I have read several blogs in which they specified that you should clip your gradients to the largest value that doesn't cause exploding gradients. But that also means that you could get a situation that you don't need to do gradient clipping at all. Could that hurt convergence speed in some way? Is there any guideline for how to pick this hyperparameter based on learning rate, batch size, model size, input size (seq len, img size) etc.? Or is this project/use case dependent and it represents "just another hyperparameter"? submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 1 min )
    [P] HuSpaCy: Industrial-strength Hungarian NLP
    I'd like to show off a Hungarian NLP pipeline which we've been heavily improving over the past year. https://github.com/huspacy/huspacy While processing Hungarian texts might not be interesting for the most of you, I believe this project can be a good learning resource, as our models utilize the latest NLP technologies from Explosion and are fully reproducible: We've created a transformers-based model on top of a language specific BERT model Multi-task and transfer-learning is heavily used across the models. Incorporated edit-tree lemmatization and biaffine parsing from spacy-experimental We provide word embeddings using the (fastText-like) floret tool submitted by /u/oroszgy [link] [comments]  ( 1 min )
    [D] How do you decide the ranges for hyperparameters when doing a grid search or a random search?
    I'm trying to tune the parameters of two models - a random forest classifier, and a gradient boosting classifier. When using a grid search or a random search (I might also play around with the genetic algorithm), what are appropriate ranges to use / how can I come to know them? I obviously can't just specify a very arbitrary range because that might decrease the effectiveness of the random search, and make the grid search too long. Do they depend on the number of features in the dataset or other characteristics (e.g. our dataset is quite imbalanced and we're using SMOTE resampling)? I'm just trying to tune n_estimators and max_depth, but RandomForestClassifier also has many other parameters, do I just experiment with all of them, or are there any known parameters that don't do anything useful unless I'm trying for an extra 0.1% of accuracy or recall? Basically the same questions as above for the GradientBoostingClassifier. Letting go of specific models for a second, I'd appreciate generic pointers on how to deliberately tune hyperparameters or specify ranges instead of just arbitrarily specifying sth and hoping it works. Thanks! submitted by /u/stuffingmybrain [link] [comments]  ( 2 min )
    How to do meaningful work as an independent researcher? [Discussion]
    With big players like OpenAI and Google building these massive models, how does independent researchers without access to such scale and compute do meaningful work? Came across tweets from researchers, especially ones working on generative models saying they feel their work looks irrelevant after seeing results from DALL-E 2. It feels like just a couple of years ago if you had a decent GPU setup, you could pretty much do world class research. Doesn't look like it anymore. Is there, if any, research directions that makes it a level playing field where compute and scale is not necessarily the solution, or are we all doomed to be prompt engineers for GPT models? submitted by /u/HairyIndianDude [link] [comments]  ( 4 min )
  • Open

    How long before dall-e 2 (or similarly capable) produces porn ?
    Dall-e 2 has been released a few weaks and is exceptionnally capable for generating images from a text description. However training data containing porn has been filtered out as well as requests that contains nsfw terms. So is it not suited for generated porn for now. However, how long would it takes before : Someone does something as capable and allows porn OpenAI opens dall-e 2 to such content (edit)This is an open discussion.But there is clear evidence that the technology now exists to create on demand and instantly any kind of porn with any kind of kink in a lot of styles including photorealistic style. submitted by /u/exotic_deviantcy [link] [comments]
    Augmented Military Soldiers
    We often believe that future soldiers are going to be robots. There will no longer be human "boots on the ground" thanks to artificial intelligence, drones, etc. Many of us fail to realize that current soldiers, in militaries across the globe, are being augmented with AI as we speak. Things like AR headsets, computerized sights, and other technologies are going to turn them into cyborgs...Another one of the intriguing aspects of this is that major companies like Microsoft are creating these technologies for the military. How long until we see augmented soldiers going against one another? https://www.linkedin.com/pulse/augmented-soldier-mvyl-associates-1f/?trackingId=FnvaWMtxMsfm2fJeQP1bgg%3D%3D submitted by /u/IsabeldeMontoya [link] [comments]  ( 1 min )
    A Parable Of Explainability
    submitted by /u/elcric_krej [link] [comments]
    AI Dream 18 - Visual Trip through Wonderland
    submitted by /u/LordPewPew777 [link] [comments]
    Community College AI Program Needs
    I am the instructor and lead of the AI program at a community college in North Carolina. I am developing the program currently and have money to spend on Cool Educational Robotics and potentially other tools. Does anybody have any recommendations for robots to teach python with classic search, reinforcement learning methods, or other university level algorithms? A robotic arm would be nice, but I need to tie it directly to a potential project in machine learning, deep learning, reinforcement learning, search, logic, computer vision, etc. We have a NAO robot and are getting a LIMO: https://www.robotlab.com/store/limo-agilex But we could probably use an Educational Robotic Arm Thanks! Our program is one of the first Associates programs in AI in the country. Check out our program website. It is completely available online and we accept out of state students. https://www.waynecc.edu/programs/ai/ Feel free to ask any questions as well. *Also, advice on best value cloud computing virtualized resources for high performance computing for deep learning projects is appreciated. We are currently planning on going with Microsoft's Data Science Virtual Machines but not sure which series exactly submitted by /u/Wayne_CC_AI [link] [comments]  ( 1 min )
    PeopleLens AI Helps The Blind | Brain Fingerprints Detect Autism | AI Predicts Cancer Tumor Regrowth
    submitted by /u/getrich_or_diemining [link] [comments]
    Uhhhh
    submitted by /u/Blazeolmo [link] [comments]
    Dall-E 2 access
    Hello everyone, I wanted to know if one could get access to the dall-e 2 app without having to hope to be chosen from the wait list. Thanks! submitted by /u/Swaggyswaggerson [link] [comments]  ( 1 min )
    A brief history of deepfakes - how it started and where it might take us
    submitted by /u/much_successes [link] [comments]
    𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐨𝐟 𝐓𝐡𝐢𝐧𝐠𝐬 (𝐀𝐈𝐨𝐓) - 𝐋𝐚𝐭𝐞𝐬𝐭 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲, 𝐅𝐮𝐭𝐮𝐫𝐞 𝐔𝐩𝐜𝐨𝐦𝐢𝐧𝐠 𝐓𝐫𝐞𝐧𝐝𝐬 𝐚𝐧𝐝 𝐅𝐨𝐫𝐞𝐜𝐚𝐬𝐭 𝐭𝐨 𝟐𝟎𝟐𝟖
    Our Latest research report on the Artificial Intelligence of Things (AIoT) market shows how that market is changing and how trends in demographics, business cycles, and microeconomics affect the Artificial Intelligence of Things (AIoT) market as a whole. Our study of the global Artificial Intelligence of Things (AIoT) market demonstrates what's happening with business state by looking at production value and key regions. The market report provides an entire analysis of sales volume, pricing analysis, revenue, the margin of profit, the expansion rate within the Artificial Intelligence of Things (AIoT) market. Get A Free Sample Report @ https://www.intelligencemarketreport.com/report-sample/573982 ​ \"𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐨𝐟 𝐓𝐡𝐢𝐧𝐠𝐬 \" Key Information Extracted from the Report ​ Extensive information on factors estimated to affect the Market growth and market share during the forecast period is presented in the report. The report offers the present scenario and future growth prospects Market in various geographical regions. The competitive landscape analysis on the market as well as the qualitative and quantitative information is delivered. The SWOT analysis is conducted along with Porter's Five Force analysis. · The in-depth analysis provides an insight into the Market, underlining the growth rate and opportunities offered in the business. Leading Key Players Included In This Report Are: · Twilio Inc. · ShiftPixy Inc. · Micron Technology · Intel · IBM · Gopher Protocol · Deep Vision · Ceva · ALCES · AISPEECH submitted by /u/Purva_Duggal [link] [comments]
    Free Webinar on Automated CV pipelines | Video Predictions
    Automated CV Pipelines 4th part is open for registration. It will be covering some of the best practices for video-specific annotation tasks. If you are interested you can check out the details here! submitted by /u/WeekendClassic [link] [comments]
    Stairway to (A.I. animation + sound design)
    submitted by /u/nenomancer [link] [comments]  ( 1 min )
    Ancient civs always burn (A.I. animation + some sound design)
    submitted by /u/nenomancer [link] [comments]
    Treasure Planet made with starryai
    submitted by /u/Losthel [link] [comments]
    Elon Musk's NEURALINK vs Bryan Johnson's KERNEL (No Surgery)
    submitted by /u/1024cities [link] [comments]  ( 1 min )
    Sentiment, cognitive distortions and market data time series - causal analysis for crypto markets
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    High Tech Hacks 2022 ! !
    Hey guys! I’m excited to share with you an exciting upcoming hackathon, High Tech Hacks 2.0! High Tech Hacks is a free, international 24-hour hackathon on May 21-22nd, 2022 open to all high schoolers hoping to learn a new coding skill, compete for awesome prizes, or work with other like-minded hackers. Let’s invent, create, and push the boundaries of technology (as much as we can at one hackathon)! What to expect: Last year, participants learned the basics of web development, Python, virtual reality, and how to make a Discord bot from current software engineers at Microsoft, Amazon, Twilio, other tech companies, and Columbia University SHPE. Thanks to our company sponsors, each participant last year received nearly $400 worth of free software and swag. Register to earn FREE swag (t-shirts, water bottles, stickers!) Network with other passionate STEM high school students from around the world! (Last year we had participants from 26 countries signed up already!) This year we have even bigger prizes, competitions, and speakers so stay tuned! Reach out to me with more questions or email [hightechhackathon@gmail.com](mailto:hightechhackathon@gmail.com). Happy hacking! :D Sign up here to confirm your interest and get on our mailing list: Click Here to Register! Also, meet other hackers by Joining our Discord! For more, Check out our Website submitted by /u/HighTechHacks [link] [comments]  ( 1 min )
    MyStyle: The Best AI Face Manipulation to Date!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
  • Open

    Amazon Rekognition introduces Streaming Video Events to provide real-time alerts on live video streams
    Today, AWS announced the general availability of Amazon Rekognition Streaming Video Events, a fully managed service for camera manufacturers and service providers that uses machine learning (ML) to detect objects such as people, pets, and packages in live video streams from connected cameras. Amazon Rekognition Streaming Video Events sends them a notification as soon as […]  ( 8 min )
    3xLOGIC uses Amazon Rekognition Streaming Video Events to provide intelligent video analytics on live video streams to monitoring agents
    3xLOGIC is a leader in commercial electronic security systems. They provide commercial security systems and managed video monitoring for businesses, hospitals, schools, and government agencies. Managed video monitoring is a critical component of a comprehensive security strategy for 3xLOGIC’s customers. With more than 50,000 active cameras in the field, video monitoring teams face a daily […]  ( 4 min )
    Abode uses Amazon Rekognition Streaming Video Events to provide real-time notifications to their smart home customers
    Abode Systems (Abode) offers homeowners a comprehensive suite of do-it-yourself home security solutions that can be set up in minutes and enables homeowners to keep their family and property safe. Since the company’s launch in 2015, in-camera motion detection sensors have played an essential part in Abode’s solution, enabling customers to receive notifications and monitor […]  ( 6 min )
    Pandas user-defined functions are now available in Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Data Wrangler, you can select and query data with just a few clicks, quickly transform data with over 300 built-in data transformations, and understand your data with built-in visualizations without writing any code. Additionally, […]  ( 4 min )
    How Searchmetrics uses Amazon SageMaker to automatically find relevant keywords and make their human analysts 20% faster
    Searchmetrics is a global provider of search data, software, and consulting solutions, helping customers turn search data into unique business insights. To date, Searchmetrics has helped more than 1,000 companies such as McKinsey & Company, Lowe’s, and AXA find an advantage in the hyper-competitive search landscape. In 2021, Searchmetrics turned to AWS to help with […]  ( 5 min )
    Identify paraphrased text with Hugging Face on Amazon SageMaker
    Identifying paraphrased text has business value in many use cases. For example, by identifying sentence paraphrases, a text summarization system could remove redundant information. Another application is to identify plagiarized documents. In this post, we fine-tune a Hugging Face transformer on Amazon SageMaker to identify paraphrased sentence pairs in a few steps. A truly robust […]  ( 10 min )
    How Moovit turns data into insights to help passengers avoid delays using Apache Airflow and Amazon SageMaker
    This is a guest post by Moovit’s Software and Cloud Architect, Sharon Dahan. Moovit, an Intel company, is a leading Mobility as a Service (MaaS) solutions provider and creator of the top urban mobility app. Moovit serves over 1.3 billion riders in 3,500 cities around the world. We help people everywhere get to their destination […]  ( 7 min )
  • Open

    How DNEG Helped Win Another Visual-Effects Oscar by Bringing ‘Dune’ to Life With NVIDIA RTX
    Featuring stunning visuals from futuristic interstellar worlds, including colossal sand creatures, Dune captivated audiences around the world. The sci-fi film picked up six Oscars last month at the 94th Academy Awards, including for Best Sound and Visual Effects. Adapted from Frank Herbert’s 1965 novel of the same name, Dune tells the story of Paul Atreides, Read article > The post How DNEG Helped Win Another Visual-Effects Oscar by Bringing ‘Dune’ to Life With NVIDIA RTX appeared first on NVIDIA Blog.  ( 4 min )
    Your Odyssey Awaits: Stream ‘Lost Ark’ to Nearly Any Device This GFN Thursday
    It’s a jam-packed GFN Thursday. This week brings the popular, free-to-play, action role-playing game Lost Ark to gamers across nearly all their devices, streaming on GeForce NOW. And that’s not all. GFN Thursday also delivers an upgraded experience in the 2.0.40 update. M1-based MacBooks, iMacs and Mac Minis are now supported natively. Plus, membership gift Read article > The post Your Odyssey Awaits: Stream ‘Lost Ark’ to Nearly Any Device This GFN Thursday appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Top 11 Retail Technology Trends For 2022
    The 2019 pandemic became a catalyst for retail businesses and their customers. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 6 min )
  • Open

    How can we reduce the carbon footprint of global computing?
    Workshop hosted by MIT’s Climate and Sustainability Consortium, MIT-IBM Watson AI Lab, and the MIT Schwarzman College of Computing highlights how new approaches to computing can save energy and help the planet.  ( 8 min )
    Aging Brain Initiative awards fund five new ideas to study, fight neurodegeneration
    Competitive seed grants launch yearlong investigations of novel hypotheses about potential causes, biomarkers, treatments of Alzheimer’s and ALS.  ( 6 min )
  • Open

    Help to create a simple Neural Network... or find the bug
    Hello, I really need help creating a simple Neural Network that approximates the Rosenbrock function (a function with two variables). I have used the book "MATLAB Deep Learning" by Phil Kim, which provides code examples. However, when I make a simple example where the network is trained with backpropagation the output of the testing points simply comes out as ones. It is very early work so the code is very simple. I have just cut out the piece of code with Neural Network. Can someone give any tips or help me to figure out what is wrong? %% Neural Network % Rosenbrock function : f = (1 - x1).^2 + 100*(x2-x1.^2).^2 % X: 25x2 matrix % f_samp: 25x1 vector % Number of nodes in hidden layer nHNodes = 4 ; % Input layer has two nodes % Output layer has one node % Initial weigts W1 = 2*rand(nHNodes, k) - 1 ; W2 = 2*rand(1, nHNodes) - 1 ; alpha = 0.5 ; % Lerning rate % Backpropagation for i = 1:10000 for i = 1:size(X,1) x = X(i, :)'; d = f_samp(i); % Forward v1 = W1*x; y1 = fun_sigmoid(v1); v = W2*y1; y = fun_sigmoid(v); % Backward e = d - y; delta = y.*(1-y).*e; e1 = W2'*delta; delta1 = y1.*(1-y1).*e1; dW1 = alpha*delta1*x'; W1 = W1 + dW1; dW2 = alpha*delta*y1'; W2 = W2 + dW2; end end % Testing sampling points for i = 1:size(X,1) x = X(i, :)' ; v1 = W1*x ; y1 = fun_sigmoid(v1) ; v = W2*y1 ; y(i) = fun_sigmoid(v) ; end submitted by /u/TobiasFred [link] [comments]  ( 1 min )
    I need help making a neural network from scratch in python
    Here's my code: https://github.com/heyuhowudoin/mnist_ai I currently have 3 layers, 15, 15 and 10 neurons respectively. I'm using the MNIST database to get images of hand-drawn numbers and trying to classify them by which neuron in the final layer has the highest activation. I use a combination of sigmoid and ReLU as activation functions, and I am using minibatches of 100 pictures each. The thing I see happening is that it converges to a point where every neuron in the final layer outputs a 0 because then it has a relatively low error since the target for 9 out of 10 of those neurons is indeed 0, whereas the target for the correct output neuron is 1. I've tried looking over my backprop algorithm, I've tried changing the neurons in each hidden layer to 200, I've changed the learning speed a bunch but I can't seem to figure out why it's not working so if one of you guys wouldn't mind helping me, that would be much appreciated. submitted by /u/-i-hate-this-place- [link] [comments]  ( 1 min )
    AlphaGo Zero
    Hello, I have an alphago zero question. Why doesn’t alphago zero use Q(s,a) to choose its next move in the Monte Carlo tree search? Why does it use the π instead? submitted by /u/Skinnybisquit [link] [comments]
  • Open

    Take Your Machine Learning Skills Global
    Sponsored Post In our interconnected world, a decision made thousands of miles away can have lasting consequences for entire organizations or economies. When small changes have big effects, it is unsurprising that companies and governments are turning to machine learning and AI to accurately predict risk. ​ How the Global Community is Applying Machine Learning […] The post Take Your Machine Learning Skills Global appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models. (arXiv:2204.12584v1 [cs.RO])
    Aquatic locomotion is a classic fluid-structure interaction (FSI) problem of interest to biologists and engineers. Solving the fully coupled FSI equations for incompressible Navier-Stokes and finite elasticity is computationally expensive. Optimizing robotic swimmer design within such a system generally involves cumbersome, gradient-free procedures on top of the already costly simulation. To address this challenge we present a novel, fully differentiable hybrid approach to FSI that combines a 2D direct numerical simulation for the deformable solid structure of the swimmer and a physics-constrained neural network surrogate to capture hydrodynamic effects of the fluid. For the deformable simulation of the swimmer's body, we use state-of-the-art techniques from the field of computer graphics to speed up the finite-element method (FEM). For the fluid simulation, we use a U-Net architecture trained with a physics-based loss function to predict the flow field at each time step. The pressure and velocity field outputs from the neural network are sampled around the boundary of our swimmer using an immersed boundary method (IBM) to compute its swimming motion accurately and efficiently. We demonstrate the computational efficiency and differentiability of our hybrid simulator on a 2D carangiform swimmer. Since both the solid simulator and the hydrodynamics model are automatically differentiable, we obtain a fully differentiable FSI simulator that can be used for computational co-design of geometry and controls for rigid and soft bodies immersed in fluids, such as minimizing drag, maximizing speed, or maximizing efficiency via direct gradient-based optimization.  ( 2 min )
    Zero-Touch Network on Industrial IoT: An End-to-End Machine Learning Approach. (arXiv:2204.12605v1 [cs.LG])
    Industry 4.0-enabled smart factory is expected to realize the next revolution for manufacturers. Although artificial intelligence (AI) technologies have improved productivity, current use cases belong to small-scale and single-task operations. To unbound the potential of smart factory, this paper develops zero-touch network systems for intelligent manufacturing and facilitates distributed AI applications in both training and inferring stages in a large-scale manner. The open radio access network (O-RAN) architecture is first introduced for the zero-touch platform to enable globally controlling communications and computation infrastructure capability in the field. The designed serverless framework allows intelligent and efficient learning assignments and resource allocations. Hence, requested learning tasks can be assigned to appropriate robots, and the underlying infrastructure can be used to support the learning tasks without expert knowledge. Moreover, due to the proposed network system's flexibility, powerful AI-enabled networking algorithms can be utilized to ensure service-level agreements and superior performances for factory workloads. Finally, three open research directions of backward compatibility, end-to-end enhancements, and cybersecurity are discussed for zero-touch smart factory.  ( 2 min )
    Data-driven detector signal characterization with constrained bottleneck autoencoders. (arXiv:2203.04604v4 [physics.ins-det] UPDATED)
    A common technique in high energy physics is to characterize the response of a detector by means of models tunned to data which build parametric maps from the physical parameters of the system to the expected signal of the detector. When the underlying model is unknown it is difficult to apply this method, and often, simplifying assumptions are made introducing modeling errors. In this article, using a waveform toy model we present how deep learning in the form of constrained bottleneck autoencoders can be used to learn the underlying unknown detector response model directly from data. The results show that excellent performance results can be achieved even when the signals are significantly affected by random noise. The trained algorithm can be used simultaneously to perform estimations on the physical parameters of the model, simulate the detector response with high fidelity and to denoise detector signals.  ( 2 min )
    A Survey on Machine Learning Approaches for Modelling Intuitive Physics. (arXiv:2202.06481v2 [cs.LG] UPDATED)
    Research in cognitive science has provided extensive evidence of human cognitive ability in performing physical reasoning of objects from noisy perceptual inputs. Such a cognitive ability is commonly known as intuitive physics. With advancements in deep learning, there is an increasing interest in building intelligent systems that are capable of performing physical reasoning from a given scene for the purpose of building better AI systems. As a result, many contemporary approaches in modelling intuitive physics for machine cognition have been inspired by literature from cognitive science. Despite the wide range of work in physical reasoning for machine cognition, there is a scarcity of reviews that organize and group these deep learning approaches. Especially at the intersection of intuitive physics and artificial intelligence, there is a need to make sense of the diverse range of ideas and approaches. Therefore, this paper presents a comprehensive survey of recent advances and techniques in intuitive physics-inspired deep learning approaches for physical reasoning. The survey will first categorize existing deep learning approaches into three facets of physical reasoning before organizing them into three general technical approaches and propose six categorical tasks of the field. Finally, we highlight the challenges of the current field and present some future research directions.  ( 2 min )
    Generating Examples From CLI Usage: Can Transformers Help?. (arXiv:2204.12648v1 [cs.SE])
    Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks. Relying on outdated documentation and examples can lead programs to fail or be less efficient or even less secure. In response, programmers need to regularly turn to other resources on the web such as StackOverflow for examples to guide them in writing software. We recognize that this inconvenient, error-prone, and expensive process can be improved by using machine learning applied to software usage data. In this paper, we present our practical system which uses machine learning on large-scale telemetry data and documentation corpora, generating appropriate and complex examples that can be used to improve documentation. We discuss both feature-based and transformer-based machine learning approaches and demonstrate that our system achieves 100% coverage for the used functionalities in the product, providing up-to-date examples upon every release and reduces the numbers of PRs submitted by software owners writing and editing documentation by >68%. We also share valuable lessons learnt during the 3 years that our production quality system has been deployed for Azure Cloud Command Line Interface (Azure CLI).  ( 2 min )
    Generating Self-Serendipity Preference in Recommender Systems for Addressing Cold Start Problems. (arXiv:2204.12651v1 [cs.IR])
    Classical accuracy-oriented Recommender Systems (RSs) typically face the cold-start problem and the filter-bubble problem when users suffer the familiar, repeated, and even predictable recommendations, making them boring and unsatisfied. To address the above issues, serendipity-oriented RSs are proposed to recommend appealing and valuable items significantly deviating from users' historical interactions and thus satisfying them by introducing unexplored but relevant candidate items to them. In this paper, we devise a novel serendipity-oriented recommender system (\textbf{G}enerative \textbf{S}elf-\textbf{S}erendipity \textbf{R}ecommender \textbf{S}ystem, \textbf{GS$^2$-RS}) that generates users' self-serendipity preferences to enhance the recommendation performance. Specifically, this model extracts users' interest and satisfaction preferences, generates virtual but convincible neighbors' preferences from themselves, and achieves their self-serendipity preference. Then these preferences are injected into the rating matrix as additional information for RS models. Note that GS$^2$-RS can not only tackle the cold-start problem but also provides diverse but relevant recommendations to relieve the filter-bubble problem. Extensive experiments on benchmark datasets illustrate that the proposed GS$^2$-RS model can significantly outperform the state-of-the-art baseline approaches in serendipity measures with a stable accuracy performance.  ( 2 min )
    AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands. (arXiv:2112.13168v4 [q-bio.QM] UPDATED)
    Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Then, we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. We also validate these predictions via auto-docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.  ( 2 min )
    Trusted Multi-View Classification with Dynamic Evidential Fusion. (arXiv:2204.11423v2 [cs.LG] UPDATED)
    Existing multi-view classification algorithms focus on promoting accuracy by exploiting different views, typically integrating them into common representations for follow-up tasks. Although effective, it is also crucial to ensure the reliability of both the multi-view integration and the final decision, especially for noisy, corrupted and out-of-distribution data. Dynamically assessing the trustworthiness of each view for different samples could provide reliable integration. This can be achieved through uncertainty estimation. With this in mind, we propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC), providing a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The proposed TMC can promote classification reliability by considering evidence from each view. Specifically, we introduce the variational Dirichlet to characterize the distribution of the class probabilities, parameterized with evidence from different views and integrated with the Dempster-Shafer theory. The unified learning framework induces accurate uncertainty and accordingly endows the model with both reliability and robustness against possible noise or corruption. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.  ( 2 min )
    An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate. (arXiv:2204.12554v1 [cs.LG])
    A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.  ( 2 min )
    Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models. (arXiv:2104.05158v6 [cs.DC] UPDATED)
    Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments.  ( 3 min )
    Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning. (arXiv:2204.08735v2 [cs.LG] UPDATED)
    Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.  ( 2 min )
    Performer: A Novel PPG to ECG Reconstruction Transformer For a Digital Biomarker of Cardiovascular Disease Detection. (arXiv:2204.11795v2 [eess.SP] UPDATED)
    Cardiovascular diseases (CVDs) have become the top one cause of death; three-quarters of these deaths occur in lower-income communities. Electrocardiography (ECG), an electrical measurement capturing the cardiac activities, is a gold-standard to diagnose CVDs. However, ECG is infeasible for continuous cardiac monitoring due to its requirement for user participation. Meanwhile, photoplethysmography (PPG) is easy to collect, but the limited accuracy constrains its clinical usage. In this research, a novel Transformer-based architecture, Performer, is invented to reconstruct ECG from PPG and to create a novel digital biomarker, PPG along with its reconstructed ECG, as multiple modalities for CVD detection. This architecture, for the first time, performs Transformer sequence to sequence translation on biomedical waveforms, while also utilizing the advantages of the easily accessible PPG and the well-studied base of ECG. Shifted Patch-based Attention (SPA) is created to maximize the signal features by fetching the various sequence lengths as hierarchical stages into the training while also capturing cross-patch connections through the shifted patch mechanism. This architecture generates a state-of-the-art performance of 0.29 RMSE for reconstructing ECG from PPG, achieving an average of 95.9% diagnosis for CVDs on the MIMIC III dataset and 75.9% for diabetes on the PPG-BP dataset. Performer, along with its novel digital biomarker, offers a low-cost and non-invasive solution for continuous cardiac monitoring, only requiring the easily extractable PPG data to reconstruct the not-as-accessible ECG data. As a prove of concept, an earring wearable, named PEARL (prototype), is designed to scale up the point-of-care (POC) healthcare system.  ( 2 min )
    An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models. (arXiv:2204.11351v2 [cs.LG] UPDATED)
    Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanations (SHAP) is one of such external methods, which requires a background dataset when interpreting ANNs. Generally, a background dataset consists of instances randomly sampled from the training dataset. However, the sampling size and its effect on SHAP remain to be unexplored. In our empirical study on the MIMIC-III dataset, we show that the two core explanations - SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot unquestioningly trust the one-shot interpretation from SHAP. Luckily, such fluctuation decreases with the increase of the background dataset size. Also, we notice an U-shape in the stability assessment of SHAP variable rankings, demonstrating that SHAP is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.  ( 2 min )
    Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines. (arXiv:2204.11131v2 [cs.LG] UPDATED)
    Developing modern machine learning (ML) applications is data-centric, of which one fundamental challenge is to understand the influence of data quality to ML training -- "Which training examples are 'guilty' in making the trained ML model predictions inaccurate or unfair?" Modeling data influence for ML training has attracted intensive interest over the last decade, and one popular framework is to compute the Shapley value of each training example with respect to utilities such as validation accuracy and fairness of the trained ML model. Unfortunately, despite recent intensive interest and research, existing methods only consider a single ML model "in isolation" and do not consider an end-to-end ML pipeline that consists of data transformations, feature extractors, and ML training. We present DataScope (ease.ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training. To this end, we first develop a novel algorithmic framework that computes Shapley value over a specific family of ML pipelines that we call canonical pipelines: a positive relational algebra query followed by a K-nearest-neighbor (KNN) classifier. We show that, for many subfamilies of canonical pipelines, computing Shapley value is in PTIME, contrasting the exponential complexity of computing Shapley value in general. We then put this to practice -- given an sklearn pipeline, we approximate it with a canonical pipeline to use as a proxy. We conduct extensive experiments illustrating different use cases and utilities. Our results show that DataScope is up to four orders of magnitude faster over state-of-the-art Monte Carlo-based methods, while being comparably, and often even more, effective in data debugging.  ( 2 min )
    Reinforced Causal Explainer for Graph Neural Networks. (arXiv:2204.11028v2 [cs.LG] UPDATED)
    Explainability is crucial for probing graph neural networks (GNNs), answering questions like "Why the GNN model makes a certain prediction?". Feature attribution is a prevalent technique of highlighting the explanatory subgraph in the input graph, which plausibly leads the GNN model to make its prediction. Various attribution methods exploit gradient-like or attention scores as the attributions of edges, then select the salient edges with top attribution scores as the explanation. However, most of these works make an untenable assumption - the selected edges are linearly independent - thus leaving the dependencies among edges largely unexplored, especially their coalition effect. We demonstrate unambiguous drawbacks of this assumption - making the explanatory subgraph unfaithful and verbose. To address this challenge, we propose a reinforcement learning agent, Reinforced Causal Explainer (RC-Explainer). It frames the explanation task as a sequential decision process - an explanatory subgraph is successively constructed by adding a salient edge to connect the previously selected subgraph. Technically, its policy network predicts the action of edge addition, and gets a reward that quantifies the action's causal effect on the prediction. Such reward accounts for the dependency of the newly-added edge and the previously-added edges, thus reflecting whether they collaborate together and form a coalition to pursue better explanations. As such, RC-Explainer is able to generate faithful and concise explanations, and has a better generalization power to unseen graphs. When explaining different GNNs on three graph classification datasets, RC-Explainer achieves better or comparable performance to SOTA approaches w.r.t. predictive accuracy and contrastivity, and safely passes sanity checks and visual inspections. Codes are available at https://github.com/xiangwang1223/reinforced_causal_explainer.  ( 2 min )
    Long-term Spatio-temporal Forecasting via Dynamic Multiple-Graph Attention. (arXiv:2204.11008v2 [cs.LG] UPDATED)
    Many real-world ubiquitous applications, such as parking recommendations and air pollution monitoring, benefit significantly from accurate long-term spatio-temporal forecasting (LSTF). LSTF makes use of long-term dependency between spatial and temporal domains, contextual information, and inherent pattern in the data. Recent studies have revealed the potential of multi-graph neural networks (MGNNs) to improve prediction performance. However, existing MGNN methods cannot be directly applied to LSTF due to several issues: the low level of generality, insufficient use of contextual information, and the imbalanced graph fusion approach. To address these issues, we construct new graph models to represent the contextual information of each node and the long-term spatio-temporal data dependency structure. To fuse the information across multiple graphs, we propose a new dynamic multi-graph fusion module to characterize the correlations of nodes within a graph and the nodes across graphs via the spatial attention and graph attention mechanisms. Furthermore, we introduce a trainable weight tensor to indicate the importance of each node in different graphs. Extensive experiments on two large-scale datasets demonstrate that our proposed approaches significantly improve the performance of existing graph neural network models in LSTF prediction tasks.  ( 2 min )
    Sublinear Time Approximation of Text Similarity Matrices. (arXiv:2112.09631v3 [cs.LG] UPDATED)
    We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $\Omega(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystr\"{o}m method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.  ( 2 min )
    AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees. (arXiv:2201.07984v2 [cs.AI] UPDATED)
    Using the pre-trained language model (i.e. BERT) to apprehend source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to directly solve programming language (PL) related problems. To this end, we propose the AstBERT model, a pre-trained language model aiming to better understand the financial PL using the abstract syntax tree (AST). Specifically, we collect a colossal amount of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three downstream tasks.  ( 2 min )
    ROMNet: Renovate the Old Memories. (arXiv:2202.02606v2 [eess.IV] UPDATED)
    Renovating the memories in old photos is an intriguing research topic in computer vision fields. These legacy images often suffer from severe and commingled degradations such as cracks, noise, and color-fading, while lack of large-scale paired old photo datasets makes this restoration task very challenging. In this work, we present a novel reference-based end-to-end learning framework that can jointly repair and colorize the degraded legacy pictures. Specifically, the proposed framework consists of three modules: a restoration sub-network for degradation restoration, a similarity sub-network for color histogram matching and transfer, and a colorization subnet that learns to predict the chroma elements of the images conditioned on chromatic reference signals. The whole system takes advantage of the color histogram priors in a given reference image, which vastly reduces the dependency on large-scale training data. Apart from the proposed method, we also create, to our knowledge, the first public and real-world old photo dataset with paired ground truth for evaluating old photo restoration models, wherein each old photo is paired with a manually restored pristine image by PhotoShop experts. Our extensive experiments conducted on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-arts both quantitatively and qualitatively.  ( 2 min )
    LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval. (arXiv:2203.06169v2 [cs.CL] UPDATED)
    In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.  ( 2 min )
    Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning. (arXiv:2112.11663v6 [cs.LG] UPDATED)
    Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm that has been widely used for model training in various machine learning applications, which aims to solve a nonconvex minimax optimization problem. However, the existing studies show that it suffers from a high computation complexity in nonconvex minimax optimization. In this paper, we develop a single-loop and fast AltGDA-type algorithm that leverages proximal gradient updates and momentum acceleration to solve regularized nonconvex minimax optimization problems. By leveraging the momentum acceleration technique, we prove that the algorithm converges to a critical point in nonconvex minimax optimization and achieves a computation complexity in the order of $\mathcal{O}(\kappa^{\frac{11}{6}}\epsilon^{-2})$, where $\epsilon$ is the desired level of accuracy and $\kappa$ is the problem's condition number. {Such a computation complexity improves the state-of-the-art complexities of single-loop GDA and AltGDA algorithms (see the summary of comparison in \Cref{table1})}. We demonstrate the effectiveness of our algorithm via an experiment on adversarial deep learning.  ( 2 min )
    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite. (arXiv:2202.12169v2 [eess.AS] UPDATED)
    VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlapping speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as most smart home devices now support multiple enrolled users. In order to extend the benefits of personalization to multiple users, we previously developed an attention-based speaker selection mechanism and applied it to VoiceFilter-Lite. However, the original multi-user VoiceFilter-Lite model suffers from significant performance degradation compared with single-user models. In this paper, we devised a series of experiments to improve the multi-user VoiceFilter-Lite model. By incorporating a dual learning rate schedule and by using feature-wise linear modulation (FiLM) to condition the model with the attended speaker embedding, we successfully closed the performance gap between multi-user and single-user VoiceFilter-Lite models on single-speaker evaluations. At the same time, the new model can also be easily extended to support any number of users, and significantly outperforms our previously published model on multi-speaker evaluations.  ( 2 min )
    GNMR: A provable one-line algorithm for low rank matrix recovery. (arXiv:2106.12933v3 [math.OC] UPDATED)
    Low rank matrix recovery problems, including matrix completion and matrix sensing, appear in a broad range of applications. In this work we present GNMR -- an extremely simple iterative algorithm for low rank matrix recovery, based on a Gauss-Newton linearization. On the theoretical front, we derive recovery guarantees for GNMR in both the matrix sensing and matrix completion settings. Some of these results improve upon the best currently known for other methods. A key property of GNMR is that it implicitly keeps the factor matrices approximately balanced throughout its iterations. On the empirical front, we show that for matrix completion with uniform sampling, GNMR performs better than several popular methods, especially when given very few observations close to the information limit.  ( 2 min )
    Differentially Private SGDA for Minimax Problems. (arXiv:2201.09046v3 [cs.LG] UPDATED)
    Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.  ( 2 min )
    ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection. (arXiv:2202.00232v3 [eess.IV] UPDATED)
    This work proposes a novel deep neural network (DNN) architecture, Implicit Segmentation Neural Network (ISNet), to solve the task of image segmentation followed by classification. It substitutes the common pipeline of two DNNs with a single model. We designed the ISNet for high flexibility and performance: it allows virtually any classification neural network architecture to analyze a common image as if it had been previously segmented. Furthermore, in relation to the unmodified classifier, the ISNet does not cause any increment in computational cost at run-time. We test the architecture with two applications: COVID-19 detection in chest X-rays, and facial attribute estimation. We implement an ISNet based on a DenseNet121 classifier, and compare the model to a U-net (performing lung/face segmentation) followed by a DenseNet121, and to a standalone DenseNet121. The new architecture matched the other DNNs in facial attribute estimation. Moreover, it strongly surpassed them in COVID-19 detection, according to an external test dataset. The ISNet precisely ignored the image regions outside of the lungs or faces. Therefore, in COVID-19 detection it reduced the effects of background bias and shortcut learning, and it improved security in facial attribute estimation. ISNet presents an accurate, fast, and light methodology. The successful implicit segmentation, considering two largely diverse fields, highlights the architecture's general applicability.  ( 3 min )
    Variational Learning for Unsupervised Knowledge Grounded Dialogs. (arXiv:2112.00653v3 [cs.CL] UPDATED)
    Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document. These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities need to be estimated. Models such as RAG and REALM, marginalize the document probabilities over the documents retrieved from the index to define the log likelihood loss function which is optimized end-to-end. In this paper, we develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO). Using a collection of three publicly available open-conversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.  ( 2 min )
    Single-pass Object-adaptive Data Undersampling and Reconstruction for MRI. (arXiv:2111.09212v2 [eess.IV] UPDATED)
    There is much recent interest in techniques to accelerate the data acquisition process in MRI by acquiring limited measurements. Often sophisticated reconstruction algorithms are deployed to maintain high image quality in such settings. In this work, we propose a data-driven sampler using a convolutional neural network, MNet, to provide object-specific sampling patterns adaptive to each scanned object. The network observes very limited low-frequency k-space data for each object and rapidly predicts the desired undersampling pattern in one go that achieves high image reconstruction quality. We propose an accompanying alternating-type training framework with a mask-backward procedure that efficiently generates training labels for the sampler network and jointly trains an image reconstruction network. Experimental results on the fastMRI knee dataset demonstrate the ability of the proposed learned undersampling network to generate object-specific masks at fourfold and eightfold acceleration that achieve superior image reconstruction performance than several existing schemes. The source code for the proposed joint sampling and reconstruction learning framework is available at https://github.com/zhishenhuang/mri.  ( 2 min )
    Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters. (arXiv:2111.01409v2 [stat.ML] UPDATED)
    It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.  ( 2 min )
    Building separable approximations for quantum states via neural networks. (arXiv:2112.08055v4 [quant-ph] UPDATED)
    Finding the closest separable state to a given target state is a notoriously difficult task, even more difficult than deciding whether a state is entangled or separable. To tackle this task, we parametrize separable states with a neural network and train it to minimize the distance to a given target state, with respect to a differentiable distance, such as the trace distance or Hilbert--Schmidt distance. By examining the output of the algorithm, we obtain an upper bound on the entanglement of the target state, and construct an approximation for its closest separable state. We benchmark the method on a variety of well-known classes of bipartite states and find excellent agreement, even up to local dimension of $d=10$, while providing conjectures and analytic insight for isotropic and Werner states. Moreover, we show our method to be efficient in the multipartite case, considering different notions of separability. Examining three and four-party GHZ and W states we recover known bounds and obtain novel ones, for instance for triseparability.  ( 2 min )
    Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis. (arXiv:2111.15186v2 [cs.LG] UPDATED)
    Obtaining annotations for large training sets is expensive, especially in settings where domain knowledge is required, such as behavior analysis. Weak supervision has been studied to reduce annotation costs by using weak labels from task-specific labeling functions (LFs) to augment ground truth labels. However, domain experts still need to hand-craft different LFs for different tasks, limiting scalability. To reduce expert effort, we present AutoSWAP: a framework for automatically synthesizing data-efficient task-level LFs. The key to our approach is to efficiently represent expert knowledge in a reusable domain-specific language and more general domain-level LFs, with which we use state-of-the-art program synthesis techniques and a small labeled dataset to generate task-level LFs. Additionally, we propose a novel structural diversity cost that allows for efficient synthesis of diverse sets of LFs, further improving AutoSWAP's performance. We evaluate AutoSWAP in three behavior analysis domains and demonstrate that AutoSWAP outperforms existing approaches using only a fraction of the data. Our results suggest that AutoSWAP is an effective way to automatically generate LFs that can significantly reduce expert effort for behavior analysis.  ( 2 min )
    Regularized Newton Method with Global $O(1/k^2)$ Convergence. (arXiv:2112.02089v2 [math.OC] UPDATED)
    We present a Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by merging the ideas of cubic regularization with a certain adaptive Levenberg--Marquardt penalty. In particular, we show that the iterates given by $x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|} \mathbf{I}\bigr)^{-1}\nabla f(x^k)$, where $H>0$ is a constant, converge globally with a $\mathcal{O}(\frac{1}{k^2})$ rate. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence. Moreover, we prove that locally our method converges superlinearly when the objective is strongly convex. To boost the method's performance, we present a line search procedure that does not need hyperparameters and is provably efficient.  ( 2 min )
    Physics-Driven Learning of Wasserstein GAN for Density Reconstruction in Dynamic Tomography. (arXiv:2110.15424v2 [eess.IV] UPDATED)
    Object density reconstruction from projections containing scattered radiation and noise is of critical importance in many applications. Existing scatter correction and density reconstruction methods may not provide the high accuracy needed in many applications and can break down in the presence of unmodeled or anomalous scatter and other experimental artifacts. Incorporating machine-learned models could prove beneficial for accurate density reconstruction particularly in dynamic imaging, where the time-evolution of the density fields could be captured by partial differential equations or by learning from hydrodynamics simulations. In this work, we demonstrate the ability of learned deep neural networks to perform artifact removal in noisy density reconstructions, where the noise is imperfectly characterized. We use a Wasserstein generative adversarial network (WGAN), where the generator serves as a denoiser that removes artifacts in densities obtained from traditional reconstruction algorithms. We train the networks from large density time-series datasets, with noise simulated according to parametric random distributions that may mimic noise in experiments. The WGAN is trained with noisy density frames as generator inputs, to match the generator outputs to the distribution of clean densities (time-series) from simulations. A supervised loss is also included in the training, which leads to improved density restoration performance. In addition, we employ physics-based constraints such as mass conservation during network training and application to further enable highly accurate density reconstructions. Our preliminary numerical results show that the models trained in our frameworks can remove significant portions of unknown noise in density time-series data.  ( 2 min )
    A Study of Fake News Reading and Annotating in Social Media Context. (arXiv:2109.12523v2 [cs.HC] UPDATED)
    The online spreading of fake news is a major issue threatening entire societies. Much of this spreading is enabled by new media formats, namely social networks and online media sites. Researchers and practitioners have been trying to answer this by characterizing the fake news and devising automated methods for detecting them. The detection methods had so far only limited success, mostly due to the complexity of the news content and context and lack of properly annotated datasets. One possible way to boost the efficiency of automated misinformation detection methods, is to imitate the detection work of humans. It is also important to understand the news consumption behavior of online users. In this paper, we present an eye-tracking study, in which we let 44 lay participants to casually read through a social media feed containing posts with news articles, some of which were fake. In a second run, we asked the participants to decide on the truthfulness of these articles. We also describe a follow-up qualitative study with a similar scenario but this time with 7 expert fake news annotators. We present the description of both studies, characteristics of the resulting dataset (which we hereby publish) and several findings.  ( 2 min )
    Encoding Involutory Invariances in Neural Networks. (arXiv:2106.12891v2 [cs.LG] UPDATED)
    In certain situations, neural networks are trained upon data that obey underlying symmetries. However, the predictions do not respect the symmetries exactly unless embedded in the network structure. In this work, we introduce architectures that embed a special kind of symmetry namely, invariance with respect to involutory linear/affine transformations up to parity $p=\pm 1$. We provide rigorous theorems to show that the proposed network ensures such an invariance and present qualitative arguments for a special universal approximation theorem. An adaption of our techniques to CNN tasks for datasets with inherent horizontal/vertical reflection symmetry is demonstrated. Extensive experiments indicate that the proposed model outperforms baseline feed-forward and physics-informed neural networks while identically respecting the underlying symmetry.  ( 2 min )
    Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget. (arXiv:2106.15808v2 [cs.LG] UPDATED)
    In light of the COVID-19 pandemic, it is an open challenge and critical practical problem to find a optimal way to dynamically prescribe the best policies that balance both the governmental resources and epidemic control in different countries and regions. To solve this multi-dimensional tradeoff of exploitation and exploration, we formulate this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function. Given the historical daily cases in a region and the past intervention plans in place, the agent should generate useful intervention plans that policy makers can implement in real time to minimizing both the number of daily COVID-19 cases and the stringency of the recommended interventions. We prove this concept with simulations of multiple realistic policy making scenarios and demonstrate a clear advantage in providing a pareto optimal solution in the epidemic intervention problem.  ( 2 min )
    Leveraging power grid topology in machine learning assisted optimal power flow. (arXiv:2110.00306v3 [cs.LG] UPDATED)
    Machine learning assisted optimal power flow (OPF) aims to reduce the computational complexity of these non-linear and non-convex constrained optimization problems by consigning expensive (online) optimization to offline training. The majority of work in this area typically employs fully connected neural networks (FCNN). However, recently convolutional (CNN) and graph (GNN) neural networks have also been investigated, in effort to exploit topological information within the power grid. Although promising results have been obtained, there lacks a systematic comparison between these architectures throughout literature. Accordingly, we introduce a concise framework for generalizing methods for machine learning assisted OPF and assess the performance of a variety of FCNN, CNN and GNN models for two fundamental approaches in this domain: regression (predicting optimal generator set-points) and classification (predicting the active set of constraints). For several synthetic power grids with interconnected utilities, we show that locality properties between feature and target variables are scarce and subsequently demonstrate marginal utility of applying CNN and GNN architectures compared to FCNN for a fixed grid topology. However, with variable topology (for instance, modeling transmission line contingency), GNN models are able to straightforwardly take the change of topological information into account and outperform both FCNN and CNN models.  ( 2 min )
    Maximum Entropy Dueling Network Architecture in Atari Domain. (arXiv:2107.14457v2 [cs.LG] UPDATED)
    In recent years, there have been many deep structures for Reinforcement Learning, mainly for value function estimation and representations. These methods achieved great success in Atari 2600 domain. In this paper, we propose an improved architecture based upon Dueling Networks, in this architecture, there are two separate estimators, one approximate the state value function and the other, state advantage function. This improvement based on Maximum Entropy, shows better policy evaluation compared to the original network and other value-based architectures in Atari domain.  ( 2 min )
    Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future. (arXiv:2106.04420v8 [cs.LG] UPDATED)
    In real-time forecasting in public health, data collection is a non-trivial and demanding task. Often after initially released, it undergoes several revisions later (maybe due to human or technical constraints) - as a result, it may take weeks until the data reaches to a stable value. This so-called 'backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. In this paper, we introduce the multi-variate backfill problem using COVID-19 as the motivating example. We construct a detailed dataset composed of relevant signals over the past year of the pandemic. We then systematically characterize several patterns in backfill dynamics and leverage our observations for formulating a novel problem and neural framework Back2Future that aims to refines a given model's predictions in real-time. Our extensive experiments demonstrate that our method refines the performance of top models for COVID-19 forecasting, in contrast to non-trivial baselines, yielding 18% improvement over baselines, enabling us obtain a new SOTA performance. In addition, we show that our model improves model evaluation too; hence policy-makers can better understand the true accuracy of forecasting models in real-time.  ( 3 min )
    SALIENCE: An Unsupervised User Adaptation Model for Multiple Wearable Sensors Based Human Activity Recognition. (arXiv:2108.10213v2 [eess.SP] UPDATED)
    Unsupervised user adaptation aligns the feature distributions of the data from training users and the new user, so a well-trained wearable human activity recognition (WHAR) model can be well adapted to the new user. With the development of wearable sensors, multiple wearable sensors based WHAR is gaining more and more attention. In order to address the challenge that the transferabilities of different sensors are different, we propose SALIENCE (unsupervised user adaptation model for multiple wearable sensors based human activity recognition) model. It aligns the data of each sensor separately to achieve local alignment, while uniformly aligning the data of all sensors to ensure global alignment. In addition, an attention mechanism is proposed to focus the activity classifier of SALIENCE on the sensors with strong feature discrimination and well distribution alignment. Experiments are conducted on two public WHAR datasets, and the experimental results show that our model can yield a competitive performance.  ( 2 min )
    SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space. (arXiv:2008.00397v2 [cs.CV] UPDATED)
    In this work, we formulate a visual dialog as an information flow in which each piece of information is encoded with the joint visual-linguistic representation of a single dialog round. Based on this formulation, we consider the visual dialog task as a sequence problem consisting of ordered visual-linguistic vectors. For featurization, we use a Dense Symmetric Co-Attention network as a lightweight vison-language joint representation generator to fuse multimodal features (i.e., image and text), yielding better computation and data efficiencies. For inference, we propose two Sequential Dialog Networks (SeqDialN): the first uses LSTM for information propagation (IP) and the second uses a modified Transformer for multi-step reasoning (MR). Our architecture separates the complexity of multimodal feature fusion from that of inference, which allows simpler design of the inference engine. IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance. MR based SeqDialN, on the other hand, recurrently refines the semantic question/history representations through the self-attention stack of Transformer and produces promising results on the visual dialog task. On VisDial v1.0 test-std dataset, our best single generative SeqDialN achieves 62.54% NDCG and 48.63% MRR; our ensemble generative SeqDialN achieves 63.78% NDCG and 49.98% MRR, which set a new state-of-the-art generative visual dialog model. We fine-tune discriminative SeqDialN with dense annotations and boost the performance up to 72.41% NDCG and 55.11% MRR. In this work, we discuss the extensive experiments we have conducted to demonstrate the effectiveness of our model components. We also provide visualization for the reasoning process from the relevant conversation rounds and discuss our fine-tuning methods. Our code is available at https://github.com/xiaoxiaoheimei/SeqDialN  ( 3 min )
    IH-GAN: A Conditional Generative Model for Implicit Surface-Based Inverse Design of Cellular Structures. (arXiv:2103.02588v4 [cs.CE] UPDATED)
    Variable-density cellular structures can overcome connectivity and manufacturability issues of topologically optimized structures, particularly those represented as discrete density maps. However, the optimization of such cellular structures is challenging due to the multiscale design problem. Past work addressing this problem generally either only optimizes the volume fraction of single-type unit cells but ignores the effects of unit cell geometry on properties, or considers the geometry-property relation but builds this relation via heuristics. In contrast, we propose a simple yet more principled way to accurately model the property to geometry mapping using a conditional deep generative model, named Inverse Homogenization Generative Adversarial Network (IH-GAN). It learns the conditional distribution of unit cell geometries given properties and can realize the one-to-many mapping from properties to geometries. We further reduce the complexity of IH-GAN by using the implicit function parameterization to represent unit cell geometries. Results show that our method can 1) generate various unit cells that satisfy given material properties with high accuracy ($R^2$-scores between target properties and properties of generated unit cells $>98\%$) and 2) improve the optimized structural performance over the conventional variable-density single-type structure. In the minimum compliance example, our IH-GAN generated structure achieves a $79.7\%$ reduction in concentrated stress and an extra $3.03\%$ reduction in displacement. In the target deformation examples, our IH-GAN generated structure reduces the target matching error by $86.4\%$ and $79.6\%$ for two test cases, respectively. We also demonstrated that the connectivity issue for multi-type unit cells can be solved by transition layer blending.  ( 3 min )
    Unify Local and Global Information for Top-$N$ Recommendation. (arXiv:2012.01635v2 [cs.IR] UPDATED)
    Knowledge graph (KG), integrating complex information and containing rich semantics, is widely considered as side information to enhance the recommendation systems. However, most of the existing KG-based methods concentrate on encoding the structural information in the graph, without utilizing the collaborative signals in user-item interaction data, which are important for understanding user preferences. Therefore, the representations learned by these models are insufficient for representing semantic information of users and items in the recommendation environment. The combination of both kinds of data provides a good chance to solve this problem. To tackle this research gap, we propose a novel duet representation learning framework named \sysname to fuse local information (user-item interaction data) and global information (external knowledge graph) for the top-$N$ recommendation, which is composed of two separate sub-models. One learns the local representations by discovering the inner correlations in local information with a knowledge-aware co-attention mechanism, and another learns the global representations by encoding the knowledge associations in global information with a relation-aware attention network. The two sub-models are jointly trained as part of the semantic fusion network to compute the user preferences, which discriminates the contribution of the two sub-models under the special context. We conduct experiments on two real-world datasets, and the evaluations show that KADM significantly outperforms state-of-art methods. Further ablation studies confirm that the duet architecture performs significantly better than either sub-model on the recommendation tasks.  ( 2 min )
    Variational Kalman Filtering with Hinf-Based Correction for Robust Bayesian Learning in High Dimensions. (arXiv:2204.13089v1 [stat.ML])
    In this paper, we address the problem of convergence of sequential variational inference filter (VIF) through the application of a robust variational objective and Hinf-norm based correction for a linear Gaussian system. As the dimension of state or parameter space grows, performing the full Kalman update with the dense covariance matrix for a large scale system requires increased storage and computational complexity, making it impractical. The VIF approach, based on mean-field Gaussian variational inference, reduces this burden through the variational approximation to the covariance usually in the form of a diagonal covariance approximation. The challenge is to retain convergence and correct for biases introduced by the sequential VIF steps. We desire a framework that improves feasibility while still maintaining reasonable proximity to the optimal Kalman filter as data is assimilated. To accomplish this goal, a Hinf-norm based optimization perturbs the VIF covariance matrix to improve robustness. This yields a novel VIF- Hinf recursion that employs consecutive variational inference and Hinf based optimization steps. We explore the development of this method and investigate a numerical example to illustrate the effectiveness of the proposed filter.  ( 2 min )
    Exoskeleton-Based Multimodal Action and Movement Recognition: Identifying and Developing the Optimal Boosted Learning Approach. (arXiv:2106.10331v2 [cs.RO] UPDATED)
    This paper makes two scientific contributions to the field of exoskeleton-based action and movement recognition. First, it presents a novel machine learning and pattern recognition-based framework that can detect a wide range of actions and movements - walking, walking upstairs, walking downstairs, sitting, standing, lying, stand to sit, sit to stand, sit to lie, lie to sit, stand to lie, and lie to stand, with an overall accuracy of 82.63%. Second, it presents a comprehensive comparative study of different learning approaches - Random Forest, Artificial Neural Network, Decision Tree, Multiway Decision Tree, Support Vector Machine, k-NN, Gradient Boosted Trees, Decision Stump, AutoMLP, Linear Regression, Vector Linear Regression, Random Tree, Na\"ive Bayes, Na\"ive Bayes (Kernel), Linear Discriminant Analysis, Quadratic Discriminant Analysis, and Deep Learning applied to this framework. The performance of each of these learning approaches was boosted by using the AdaBoost algorithm, and the Cross Validation approach was used for training and testing. The results show that in boosted form, the k-NN classifier outperforms all the other boosted learning approaches and is, therefore, the optimal learning method for this purpose. The results presented and discussed uphold the importance of this work to contribute towards augmenting the abilities of exoskeleton-based assisted and independent living of the elderly in the future of Internet of Things-based living environments, such as Smart Homes. As a specific use case, we also discuss how the findings of our work are relevant for augmenting the capabilities of the Hybrid Assistive Limb exoskeleton, a highly functional lower limb exoskeleton.  ( 2 min )
    Residual Contrastive Learning for Image Reconstruction: Learning Transferable Representations from Noisy Images. (arXiv:2106.10070v2 [cs.CV] UPDATED)
    This paper is concerned with contrastive learning (CL) for low-level image restoration and enhancement tasks. We propose a new label-efficient learning paradigm based on residuals, residual contrastive learning (RCL), and derive an unsupervised visual representation learning framework, suitable for low-level vision tasks with noisy inputs. While supervised image reconstruction aims to minimize residual terms directly, RCL alternatively builds a connection between residuals and CL by defining a novel instance discrimination pretext task, using residuals as the discriminative feature. Our formulation mitigates the severe task misalignment between instance discrimination pretext tasks and downstream image reconstruction tasks, present in existing CL frameworks. Experimentally, we find that RCL can learn robust and transferable representations that improve the performance of various downstream tasks, such as denoising and super resolution, in comparison with recent self-supervised methods designed specifically for noisy inputs. Additionally, our unsupervised pre-training can significantly reduce annotation costs whilst maintaining performance competitive with fully-supervised image reconstruction.  ( 2 min )
    Neural String Edit Distance. (arXiv:2104.08388v2 [cs.CL] UPDATED)
    We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance. We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function, allowing us to integrate it into a neural network providing a contextual representation of the input. We evaluate on cognate detection, transliteration, and grapheme-to-phoneme conversion, and show that we can trade off between performance and interpretability in a single framework. Using contextual representations, which are difficult to interpret, we match the performance of state-of-the-art string-pair matching models. Using static embeddings and a slightly different loss function, we force interpretability, at the expense of an accuracy drop.  ( 2 min )
    Federated Reconstruction: Partially Local Federated Learning. (arXiv:2102.03448v6 [cs.LG] UPDATED)
    Personalization methods in federated learning aim to balance the benefits of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model parameters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the first model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative filtering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.  ( 2 min )
    Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification. (arXiv:2012.07437v2 [cs.LG] UPDATED)
    Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains under-explored. In this work, we first deeply probe the working mechanism of GCL in SSNC, and find that the promotion brought by GCL is severely unevenly distributed: the improvement mainly comes from subgraphs with less annotated information, which is fundamentally different from contrastive learning in other fields. However, existing GCL methods generally ignore this uneven distribution of annotated information and apply GCL evenly to the whole graph. To remedy this issue and further improve GCL in SSNC, we propose the Topology InFormation gain-Aware Graph Contrastive Learning (TIFA-GCL) framework that considers the annotated information distribution across graph in GCL. Extensive experiments on six benchmark graph datasets, including the enormous OGB-Products graph, show that TIFA-GCL can bring a larger improvement than existing GCL methods in both transductive and inductive settings. Further experiments demonstrate the generalizability and interpretability of TIFA-GCL.  ( 2 min )
    Towards assessing agricultural land suitability with causal machine learning. (arXiv:2204.12956v1 [cs.LG])
    Understanding the suitability of agricultural land for applying specific management practices is of great importance for sustainable and resilient agriculture against climate change. Recent developments in the field of causal machine learning enable the estimation of intervention impacts on an outcome of interest, for samples described by a set of observed characteristics. We introduce an extensible data-driven framework that leverages earth observations and frames agricultural land suitability as a geospatial impact assessment problem, where the estimated effects of agricultural practices on agroecosystems serve as a land suitability score and guide decision making. We formulate this as a causal machine learning task and discuss how this approach can be used for agricultural planning in a changing climate. Specifically, we extract the agricultural management practices of "crop rotation" and "landscape crop diversity" from crop type maps, account for climate and land use data, and use double machine learning to estimate their heterogeneous effect on Net Primary Productivity (NPP), within the Flanders region of Belgium from 2010 to 2020. We find that the effect of crop rotation was insignificant, while landscape crop diversity had a small negative effect on NPP. Finally, we observe considerable effect heterogeneity in space for both practices and analyze it.  ( 2 min )
    Differentially Quantized Gradient Methods. (arXiv:2002.02508v4 [cs.LG] UPDATED)
    Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where $\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.  ( 3 min )
    Network Classification Based Structural Analysis of Real Networks and their Model-Generated Counterparts. (arXiv:1810.08498v4 [cs.SI] UPDATED)
    Data-driven analysis of complex networks has been in the focus of research for decades. An important area of research is to study how well real networks can be described with a small selection of metrics, furthermore how well network models can capture the relations between graph metrics observed in real networks. In this paper, we apply machine learning techniques to investigate the aforementioned problems. We study 500 real-world networks along with 2,000 synthetic networks generated by four frequently used network models with previously calibrated parameters to make the generated graphs as similar to the real networks as possible. This paper unifies several branches of data-driven complex network analysis, such as the study of graph metrics and their pair-wise relationships, network similarity estimation, model calibration, and graph classification. We find that the correlation profiles of the structural measures significantly differ across network domains and the domain can be efficiently determined using a small selection of graph metrics. The structural properties of the network models with fixed parameters are robust enough to perform parameter calibration. The goodness-of-fit of the network models highly depends on the network domain. By solving classification problems, we find that the models lack the capability of generating a graph with a high clustering coefficient and relatively large diameter simultaneously. On the other hand, models are able to capture exactly the degree-distribution-related metrics.  ( 2 min )
    Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump. (arXiv:2204.12929v1 [q-fin.ST])
    As the pump-and-dump schemes (P&Ds) proliferate in the cryptocurrency market, it becomes imperative to detect such fraudulent activities in advance, to inform potentially susceptible investors before they become victims. In this paper, we focus on the target coin prediction task, i.e., to predict the pump probability of all coins listed in the target exchange before a pump. We conduct a comprehensive study of the latest P&Ds, investigate 709 events organized in Telegram channels from Jan. 2019 to Jan. 2022, and unearth some abnormal yet interesting patterns of P&Ds. Empirical analysis demonstrates that pumped coins exhibit intra-channel homogeneity and inter-channel heterogeneity, which inspires us to develop a novel sequence-based neural network named SNN. Specifically, SNN encodes each channel's pump history as a sequence representation via a positional attention mechanism, which filters useful information and alleviates the noise introduced when the sequence length is long. We also identify and address the coin-side cold-start problem in a practical setting. Extensive experiments show a lift of 1.6% AUC and 41.0% Hit Ratio@3 brought by our method, making it well-suited for real-world application. As a side contribution, we release the source code of our entire data science pipeline on GitHub, along with the dataset tailored for studying the latest P&Ds.  ( 2 min )
    Accurate inference of crowdsourcing properties when using efficient allocation strategies. (arXiv:1903.03104v2 [cs.LG] UPDATED)
    Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.  ( 2 min )
    An Iterative Labeling Method for Annotating Fisheries Imagery. (arXiv:2204.12934v1 [cs.LG])
    In this paper, we present a methodology for fisheries-related data that allows us to converge on a labeled image dataset by iterating over the dataset with multiple training and production loops that can exploit crowdsourcing interfaces. We present our algorithm and its results on two separate sets of image data collected using the Seabed autonomous underwater vehicle. The first dataset comprises of 2,026 completely unlabeled images, while the second consists of 21,968 images that were point annotated by experts. Our results indicate that training with a small subset and iterating on that to build a larger set of labeled data allows us to converge to a fully annotated dataset with a small number of iterations. Even in the case of a dataset labeled by experts, a single iteration of the methodology improves the labels by discovering additional complicated examples of labels associated with fish that overlap, are very small, or obscured by the contrast limitations associated with underwater imagery.  ( 2 min )
    Treating Crowdsourcing as Examination: How to Score Tasks and Online Workers?. (arXiv:2204.13065v1 [cs.HC])
    Crowdsourcing is an online outsourcing mode which can solve the current machine learning algorithm's urge need for massive labeled data. Requester posts tasks on crowdsourcing platforms, which employ online workers over the Internet to complete tasks, then aggregate and return results to requester. How to model the interaction between different types of workers and tasks is a hot spot. In this paper, we try to model workers as four types based on their ability: expert, normal worker, sloppy worker and spammer, and divide tasks into hard, medium and easy task according to their difficulty. We believe that even experts struggle with difficult tasks while sloppy workers can get easy tasks right, and spammers always give out wrong answers deliberately. So, good examination tasks should have moderate degree of difficulty and discriminability to score workers more objectively. Thus, we first score workers' ability mainly on the medium difficult tasks, then reducing the weight of answers from sloppy workers and modifying the answers from spammers when inferring the tasks' ground truth. A probability graph model is adopted to simulate the task execution process, and an iterative method is adopted to calculate and update the ground truth, the ability of workers and the difficulty of the task successively. We verify the rightness and effectiveness of our algorithm both in simulated and real crowdsourcing scenes.  ( 2 min )
    NFT Appraisal Prediction: Utilizing Search Trends, Public Market Data, Linear Regression and Recurrent Neural Networks. (arXiv:2204.12932v1 [q-fin.ST])
    In this paper we investigate the correlation between NFT valuations and various features from three primary categories: public market data, NFT metadata, and social trends data.  ( 2 min )
    Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning. (arXiv:2204.13060v1 [cs.LG])
    Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems. Traditionally in goal-conditioned RL, an agent is provided with the exact goal they intend to reach. However, it is often not realistic to know the configuration of the goal before performing a task. A more scalable framework would allow us to provide the agent with an example of an analogous task, and have the agent then infer what the goal should be for its current state. We propose a new form of state abstraction called goal-conditioned bisimulation that captures functional equivariance, allowing for the reuse of skills to achieve new goals. We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks. Further, we prove that this learned representation is sufficient not only for goal conditioned tasks, but is amenable to any downstream task described by a state-only reward function. Videos can be found at https://sites.google.com/view/gc-bisimulation.  ( 2 min )
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v1 [cs.LG])
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.  ( 2 min )
    Can deep learning match the efficiency of human visual long-term memory to store object details?. (arXiv:2204.13061v1 [cs.LG])
    Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent appear to require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be enough for the model to reach human-level memory efficiency.  ( 2 min )
    NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue. (arXiv:2204.13021v1 [cs.CL])
    We present NLU++, a novel dataset for natural language understanding (NLU) in task-oriented dialogue (ToD) systems, with the aim to provide a much more challenging evaluation environment for dialogue NLU models, up to date with the current application and industry requirements. NLU++ is divided into two domains (BANKING and HOTELS) and brings several crucial improvements over current commonly used NLU datasets. \textbf{1)} NLU++ provides fine-grained domain ontologies with a large set of challenging \textit{multi-intent} sentences, introducing and validating the idea of \textit{intent modules} that can be combined into complex intents that convey complex user goals, combined with finer-grained and thus more challenging slot sets. \textbf{2)} The ontology is divided into \textit{domain-specific} and \textit{generic} (i.e., domain-universal) intent modules that overlap across domains, promoting cross-domain reusability of annotated examples. \textbf{3)} The dataset design has been inspired by the problems observed in industrial ToD systems, and \textbf{4)} it has been collected, filtered and carefully annotated by dialogue NLU experts, yielding high-quality annotated data. Finally, we benchmark a series of current state-of-the-art NLU models on NLU++; the results demonstrate the challenging nature of the dataset, especially in low-data regimes, the validity of `intent modularisation', and call for further research on ToD NLU.  ( 2 min )
    Dropout Inference with Non-Uniform Weight Scaling. (arXiv:2204.13047v1 [cs.LG])
    Dropout as regularization has been used extensively to prevent overfitting for training neural networks. During training, units and their connections are randomly dropped, which could be considered as sampling many different submodels from the original model. At test time, weight scaling and Monte Carlo approximation are two widely applied approaches to approximate the outputs. Both approaches work well practically when all submodels are low-bias complex learners. However, in this work, we demonstrate scenarios where some submodels behave closer to high-bias models and a non-uniform weight scaling is a better approximation for inference.  ( 2 min )
    Binding Actions to Objects in World Models. (arXiv:2204.13022v1 [cs.LG])
    We study the problem of binding actions to objects in object-factored world models using action-attention mechanisms. We propose two attention mechanisms for binding actions to objects, soft attention and hard attention, which we evaluate in the context of structured world models for five environments. Our experiments show that hard attention helps contrastively-trained structured world models to learn to separate individual objects in an object-based grid-world environment. Further, we show that soft attention increases performance of factored world models trained on a robotic manipulation task. The learned action attention weights can be used to interpret the factored world model as the attention focuses on the manipulated object in the environment.  ( 2 min )
    Unsupervised Learning of Unbiased Visual Representations. (arXiv:2204.12941v1 [cs.LG])
    Deep neural networks are known for their inability to learn robust representations when biases exist in the dataset. This results in a poor generalization to unbiased datasets, as the predictions strongly rely on peripheral and confounding factors, which are erroneously learned by the network. Many existing works deal with this issue by either employing an explicit supervision on the bias attributes, or assuming prior knowledge about the bias. In this work we study this problem in a more difficult scenario, in which no explicit annotation about the bias is available, and without any prior knowledge about its nature. We propose a fully unsupervised debiasing framework, consisting of three steps: first, we exploit the natural preference for learning malignant biases, obtaining a bias-capturing model; then, we perform a pseudo-labelling step to obtain bias labels; finally we employ state-of-the-art supervised debiasing techniques to obtain an unbiased model. We also propose a theoretical framework to assess the biasness of a model, and provide a detailed analysis on how biases affect the training of neural networks. We perform experiments on synthetic and real-world datasets, showing that our method achieves state-of-the-art performance in a variety of settings, sometimes even higher than fully supervised debiasing approaches.  ( 2 min )
    Learning to Transfer Role Assignment Across Team Sizes. (arXiv:2204.12937v1 [cs.LG])
    Multi-agent reinforcement learning holds the key for solving complex tasks that demand the coordination of learning agents. However, strong coordination often leads to expensive exploration over the exponentially large state-action space. A powerful approach is to decompose team works into roles, which are ideally assigned to agents with the relevant skills. Training agents to adaptively choose and play emerging roles in a team thus allows the team to scale to complex tasks and quickly adapt to changing environments. These promises, however, have not been fully realised by current role-based multi-agent reinforcement learning methods as they assume either a pre-defined role structure or a fixed team size. We propose a framework to learn role assignment and transfer across team sizes. In particular, we train a role assignment network for small teams by demonstration and transfer the network to larger teams, which continue to learn through interaction with the environment. We demonstrate that re-using the role-based credit assignment structure can foster the learning process of larger reinforcement learning teams to achieve tasks requiring different roles. Our proposal outperforms competing techniques in enriched role-enforcing Prey-Predator games and in new scenarios in the StarCraft II Micro-Management benchmark.  ( 2 min )
    Multi-Objective Physics-Guided Recurrent Neural Networks for Identifying Non-Autonomous Dynamical Systems. (arXiv:2204.12972v1 [eess.SY])
    While trade-offs between modeling effort and model accuracy remain a major concern with system identification, resorting to data-driven methods often leads to a complete disregard for physical plausibility. To address this issue, we propose a physics-guided hybrid approach for modeling non-autonomous systems under control. Starting from a traditional physics-based model, this is extended by a recurrent neural network and trained using a sophisticated multi-objective strategy yielding physically plausible models. While purely data-driven methods fail to produce satisfying results, experiments conducted on real data reveal substantial accuracy improvements by our approach compared to a physics-based model.  ( 2 min )
    Domain Knowledge-Infused Deep Learning for Automated Analog/Radio-Frequency Circuit Parameter Optimization. (arXiv:2204.12948v1 [cs.LG])
    The design automation of analog circuits is a longstanding challenge. This paper presents a reinforcement learning method enhanced by graph learning to automate the analog circuit parameter optimization at the pre-layout stage, i.e., finding device parameters to fulfill desired circuit specifications. Unlike all prior methods, our approach is inspired by human experts who rely on domain knowledge of analog circuit design (e.g., circuit topology and couplings between circuit specifications) to tackle the problem. By originally incorporating such key domain knowledge into policy training with a multimodal network, the method best learns the complex relations between circuit parameters and design targets, enabling optimal decisions in the optimization process. Experimental results on exemplary circuits show it achieves human-level design accuracy (99%) 1.5X efficiency of existing best-performing methods. Our method also shows better generalization ability to unseen specifications and optimality in circuit performance optimization. Moreover, it applies to design radio-frequency circuits on emerging semiconductor technologies, breaking the limitations of prior learning methods in designing conventional analog circuits.  ( 2 min )
    Meshless method stencil evaluation with machine learning. (arXiv:2204.12940v1 [cs.LG])
    Meshless methods are an active and modern branch of numerical analysis with many intriguing benefits. One of the main open research questions related to local meshless methods is how to select the best possible stencil - a collection of neighbouring nodes - to base the calculation on. In this paper, we describe the procedure for generating a labelled stencil dataset and use a variation of pointNet - a deep learning network based on point clouds - to create a classifier for the quality of the stencil. We exploit features of pointNet to implement a model that can be used to classify differently sized stencils and compare it against models dedicated to a single stencil size. The model is particularly good at detecting the best and the worst stencils with a respectable area under the curve (AUC) metric of around 0.90. There is much potential for further improvement and direct application in the meshless domain.  ( 2 min )
    GypSum: Learning Hybrid Representations for Code Summarization. (arXiv:2204.12916v1 [cs.SE])
    Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.  ( 2 min )
    A Bayesian Approach To Graph Partitioning. (arXiv:2204.12927v1 [cs.LG])
    A new algorithm based on bayesian inference for learning local graph conductance based on Gaussian Process(GP) is given that uses advanced MCMC convergence ideas to create a scalable and fast algorithm for convergence to stationary distribution which is provided to learn the bahavior of conductance when traversing the indirected weighted graph. First metric embedding is used to represent the vertices of the graph. Then, uniform induced conductance is calculated for training points. Finally, in the learning step, a gaussian process is used to approximate the uniform induced conductance. MCMC is used to measure uncertainty of estimated hyper-parameters.  ( 2 min )
    Using the Projected Belief Network at High Dimensions. (arXiv:2204.12922v1 [cs.LG])
    The projected belief network (PBN) is a layered generative network (LGN) with tractable likelihood function, and is based on a feed-forward neural network (FFNN). There are two versions of the PBN: stochastic and deterministic (D-PBN), and each has theoretical advantages over other LGNs. However, implementation of the PBN requires an iterative algorithm that includes the inversion of a symmetric matrix of size M X M in each layer, where M is the layer output dimension. This, and the fact that the network must be always dimension-reducing in each layer, can limit the types of problems where the PBN can be applied. In this paper, we describe techniques to avoid or mitigate these restrictions and use the PBN effectively at high dimension. We apply the discriminatively aligned PBN (PBN-DA) to classifying and auto-encoding high-dimensional spectrograms of acoustic events. We also present the discriminatively aligned D-PBN for the first time.  ( 2 min )
    Discovering Quantum Phase Transitions with Fermionic Neural Networks. (arXiv:2202.05183v2 [physics.comp-ph] UPDATED)
    Deep neural networks have been extremely successful as highly accurate wave function ans\"atze for variational Monte Carlo calculations of molecular ground states. We present an extension of one such ansatz, FermiNet, to calculations of the ground states of periodic Hamiltonians, and study the homogeneous electron gas. FermiNet calculations of the ground-state energies of small electron gas systems are in excellent agreement with previous initiator full configuration interaction quantum Monte Carlo and diffusion Monte Carlo calculations. We investigate the spin-polarized homogeneous electron gas and demonstrate that the same neural network architecture is capable of accurately representing both the delocalized Fermi liquid state and the localized Wigner crystal state. The network is given no \emph{a priori} knowledge that a phase transition exists, but converges on the translationally invariant ground state at high density and spontaneously breaks the symmetry to produce the crystalline ground state at low density.  ( 2 min )
    Variance-Reduced Heterogeneous Federated Learning via Stratified Client Selection. (arXiv:2201.05762v2 [cs.LG] UPDATED)
    Client selection strategies are widely adopted to handle the communication-efficient problem in recent studies of Federated Learning (FL). However, due to the large variance of the selected subset's update, prior selection approaches with a limited sampling ratio cannot perform well on convergence and accuracy in heterogeneous FL. To address this problem, in this paper, we propose a novel stratified client selection scheme to reduce the variance for the pursuit of better convergence and higher accuracy. Specifically, to mitigate the impact of heterogeneity, we develop stratification based on clients' local data distribution to derive approximate homogeneous strata for better selection in each stratum. Concentrating on a limited sampling ratio scenario, we next present an optimized sample size allocation scheme by considering the diversity of stratum's variability, with the promise of further variance reduction. Theoretically, we elaborate the explicit relation among different selection schemes with regard to variance, under heterogeneous settings, we demonstrate the effectiveness of our selection scheme. Experimental results confirm that our approach not only allows for better performance relative to state-of-the-art methods but also is compatible with prevalent FL algorithms.  ( 2 min )
    Explainable k-means. Don't be greedy, plant bigger trees!. (arXiv:2111.03193v2 [cs.LG] UPDATED)
    We provide a new bi-criteria $\tilde{O}(\log^2 k)$ competitive algorithm for explainable $k$-means clustering. Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). It is described by an easy to interpret and understand (threshold) decision tree or diagram. The cost of the explainable $k$-means clustering equals to the sum of costs of its clusters; and the cost of each cluster equals the sum of squared distances from the points in the cluster to the center of that cluster. The best non bi-criteria algorithm for explainable clustering $\tilde{O}(k)$ competitive, and this bound is tight. Our randomized bi-criteria algorithm constructs a threshold decision tree that partitions the data set into $(1+\delta)k$ clusters (where $\delta\in (0,1)$ is a parameter of the algorithm). The cost of this clustering is at most $\tilde{O}(1/ \delta \cdot \log^2 k)$ times the cost of the optimal unconstrained $k$-means clustering. We show that this bound is almost optimal.  ( 2 min )
    Certified Robustness via Randomized Smoothing over Multiplicative Parameters. (arXiv:2106.14432v2 [cs.LG] UPDATED)
    Currently the most popular method of providing robustness certificates is randomized smoothing where an input is smoothed via some probability distribution. We propose a novel approach to randomized smoothing over multiplicative parameters. Using this method we construct certifiably robust classifiers with respect to a gamma correction perturbation and compare the result with classifiers obtained via other smoothing distributions (Gaussian, Laplace, uniform). The experiments show that asymmetrical Rayleigh distribution allows to obtain better certificates for some values of perturbation parameters. To the best of our knowledge it is the first work concerning certified robustness against the multiplicative gamma correction transformation and the first to study effects of asymmetrical distributions in randomized smoothing.  ( 2 min )
    Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization. (arXiv:2106.11180v2 [math.OC] UPDATED)
    Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies generalization bounds that depend solely on the true loss function. To the best of our knowledge, it is the first generalization bound in the literature that is entirely independent of any other candidates in the hypothesis class. We hope our findings can open the door for a better understanding of DRO, especially its benefits on loss minimization and other machine learning applications.  ( 2 min )
    BINAS: Bilinear Interpretable Neural Architecture Search. (arXiv:2110.12399v3 [cs.LG] UPDATED)
    Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints.  ( 2 min )
    Online Deep Learning from Doubly-Streaming Data. (arXiv:2204.11793v2 [cs.LG] UPDATED)
    This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.  ( 2 min )
    TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs. (arXiv:2204.13048v1 [q-bio.BM])
    Computational protein design has the potential to deliver novel molecular structures, binders, and catalysts for myriad applications. Recent neural graph-based models that use backbone coordinate-derived features show exceptional performance on native sequence recovery tasks and are promising frameworks for design. A statistical framework for modeling protein sequence landscapes using Tertiary Motifs (TERMs), compact units of recurring structure in proteins, has also demonstrated good performance on protein design tasks. In this work, we investigate the use of TERM-derived data as features in neural protein design frameworks. Our graph-based architecture, TERMinator, incorporates TERM-based and coordinate-based information and outputs a Potts model over sequence space. TERMinator outperforms state-of-the-art models on native sequence recovery tasks, suggesting that utilizing TERM-based and coordinate-based features together is beneficial for protein design.  ( 2 min )
    An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context. (arXiv:2204.12781v1 [cs.SE])
    As use of data driven technologies spreads, software engineers are more often faced with the task of solving a business problem using data-driven methods such as machine learning (ML) algorithms. Deployment of ML within large software systems brings new challenges that are not addressed by standard engineering practices and as a result businesses observe high rate of ML deployment project failures. Data Oriented Architecture (DOA) is an emerging approach that can support data scientists and software developers when addressing such challenges. However, there is a lack of clarity about how DOA systems should be implemented in practice. This paper proposes to consider Flow-Based Programming (FBP) as a paradigm for creating DOA applications. We empirically evaluate FBP in the context of ML deployment on four applications that represent typical data science projects. We use Service Oriented Architecture (SOA) as a baseline for comparison. Evaluation is done with respect to different application domains, ML deployment stages, and code quality metrics. Results reveal that FBP is a suitable paradigm for data collection and data science tasks, and is able to simplify data collection and discovery when compared with SOA. We discuss the advantages of FBP as well as the gaps that need to be addressed to increase FBP adoption as a standard design paradigm for DOA.  ( 2 min )
    MAPLE-Edge: A Runtime Latency Predictor for Edge Devices. (arXiv:2204.12950v1 [cs.LG])
    Neural Architecture Search (NAS) has enabled automatic discovery of more efficient neural network architectures, especially for mobile and embedded vision applications. Although recent research has proposed ways of quickly estimating latency on unseen hardware devices with just a few samples, little focus has been given to the challenges of estimating latency on runtimes using optimized graphs, such as TensorRT and specifically for edge devices. In this work, we propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware, where we train a regression network on architecture-latency pairs in conjunction with a hardware-runtime descriptor to effectively estimate latency on a diverse pool of edge devices. Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters that are widely available on all Linux kernels, while still achieving up to +49.6% accuracy gains against previous state-of-the-art baseline methods on optimized edge device runtimes, using just 10 measurements from an unseen target device. We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes by applying a trick of normalizing performance counters by the operator latency, in the measured hardware-runtime descriptor. Lastly, we show that for runtimes exhibiting lower than desired accuracy, performance can be boosted by collecting additional samples from the target device, with an extra 90 samples translating to gains of nearly +40%.  ( 2 min )
    Uncertainty-Aware Prediction of Battery Energy Consumption for Hybrid Electric Vehicles. (arXiv:2204.12825v1 [cs.LG])
    The usability of vehicles is highly dependent on their energy consumption. In particular, one of the main factors hindering the mass adoption of electric (EV), hybrid (HEV), and plug-in hybrid (PHEV) vehicles is range anxiety, which occurs when a driver is uncertain about the availability of energy for a given trip. To tackle this problem, we propose a machine learning approach for modeling the battery energy consumption. By reducing predictive uncertainty, this method can help increase trust in the vehicle's performance and thus boost its usability. Most related work focuses on physical and/or chemical models of the battery that affect the energy consumption. We propose a data-driven approach which relies on real-world datasets including battery related attributes. Our approach showed an improvement in terms of predictive uncertainty as well as in accuracy compared to traditional methods.  ( 2 min )
    Adaptable Text Matching via Meta-Weight Regulator. (arXiv:2204.12668v1 [cs.IR])
    Neural text matching models have been used in a range of applications such as question answering and natural language inference, and have yielded a good performance. However, these neural models are of a limited adaptability, resulting in a decline in performance when encountering test examples from a different dataset or even a different task. The adaptability is particularly important in the few-shot setting: in many cases, there is only a limited amount of labeled data available for a target dataset or task, while we may have access to a richly labeled source dataset or task. However, adapting a model trained on the abundant source data to a few-shot target dataset or task is challenging. To tackle this challenge, we propose a Meta-Weight Regulator (MWR), which is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. Specifically, MWR first trains the model on the uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. By iteratively performing a (meta) gradient descent, high-order gradients are propagated to the source examples. These gradients are then used to update the weights of source examples, in a way that is relevant to the target performance. As MWR is model-agnostic, it can be applied to any backbone neural model. Extensive experiments are conducted with various backbone text matching models, on four widely used datasets and two tasks. The results demonstrate that our proposed approach significantly outperforms a number of existing adaptation methods and effectively improves the cross-dataset and cross-task adaptability of the neural text matching models in the few-shot setting.  ( 2 min )
    Topological Data Analysis for Anomaly Detection in Host-Based Logs. (arXiv:2204.12919v1 [cs.LG])
    Topological Data Analysis (TDA) gives practioners the ability to analyse the global structure of cybersecurity data. We use TDA for anomaly detection in host-based logs collected with the open-source Logging Made Easy (LME) project. We present an approach that builds a filtration of simplicial complexes directly from Windows logs, enabling analysis of their intrinsic structure using topological tools. We compare the efficacy of persistent homology and the spectrum of graph and hypergraph Laplacians as feature vectors against a standard log embedding that counts events, and find that topological and spectral embeddings of computer logs contain discriminative information for classifying anomalous logs that is complementary to standard embeddings. We end by discussing the potential for our methods to be used as part of an explainable framework for anomaly detection.  ( 2 min )
    FlowGNN: A Dataflow Architecture for Universal Graph Neural Network Inference via Multi-Queue Streaming. (arXiv:2204.13103v1 [cs.DC])
    Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models and fast inference simultaneously is challenging because of the gap between developing efficient accelerators and the rapid creation of new GNN models. Prior art focuses on the acceleration of specific classes of GNNs, such as Graph Convolutional Network (GCN), but lacks the generality to support a wide range of existing or new GNN models. Meanwhile, most work rely on graph pre-processing to exploit data locality, making them unsuitable for real-time applications. To address these limitations, in this work, we propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which can flexibly support the majority of message-passing GNNs. The contributions are three-fold. First, we propose a novel and scalable dataflow architecture, which flexibly supports a wide range of GNN models with message-passing mechanism. The architecture features a configurable dataflow optimized for simultaneous computation of node embedding, edge embedding, and message passing, which is generally applicable to all models. We also propose a rich library of model-specific components. Second, we deliver ultra-fast real-time GNN inference without any graph pre-processing, making it agnostic to dynamically changing graph structures. Third, we verify our architecture on the Xilinx Alveo U50 FPGA board and measure the on-board end-to-end performance. We achieve a speed-up of up to 51-254x against CPU (6226R) and 1.3-477x against GPU (A6000) (with batch sizes 1 through 1024); we also outperform the SOTA GNN accelerator I-GCN by 1.03x and 1.25x across two datasets. Our implementation code and on-board measurement are publicly available on GitHub.  ( 2 min )
    Trainable Compound Activation Functions for Machine Learning. (arXiv:2204.12920v1 [cs.LG])
    Activation functions (AF) are necessary components of neural networks that allow approximation of functions, but AFs in current use are usually simple monotonically increasing functions. In this paper, we propose trainable compound AF (TCA) composed of a sum of shifted and scaled simple AFs. TCAs increase the effectiveness of networks with fewer parameters compared to added layers. TCAs have a special interpretation in generative networks because they effectively estimate the marginal distributions of each dimension of the data using a mixture distribution, reducing modality and making linear dimension reduction more effective. When used in restricted Boltzmann machines (RBMs), they result in a novel type of RBM with mixture-based stochastic units. Improved performance is demonstrated in experiments using RBMs, deep belief networks (DBN), projected belief networks (PBN), and variational auto-encoders (VAE).  ( 2 min )
    Spending Privacy Budget Fairly and Wisely. (arXiv:2204.12903v1 [cs.LG])
    Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are critical for predictive quality of synthetic data. Further, loss of predictive quality may be non-uniform across the data set, with subsets that correspond to minority groups potentially suffering a higher loss. In this paper, we develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data, and "fairly" to bound potential disparities in accuracy across groups and reduce inequality. Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation. These insights make our methods tunable to social contexts, allowing data owners to produce balanced synthetic data for predictive analysis.  ( 2 min )
    First do no harm: counterfactual objective functions for safe & ethical AI. (arXiv:2204.12993v1 [cs.AI])
    To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. In this paper we develop the first statistical definition of harm and a framework for factoring harm into algorithmic decisions. We argue that harm is fundamentally a counterfactual quantity, and show that standard machine learning algorithms are guaranteed to pursue harmful policies in certain environments. To resolve this, we derive a family of counterfactual objective functions that robustly mitigate for harm. We demonstrate our approach with a statistical model for identifying optimal drug doses. While identifying optimal doses using the causal treatment effect results in harmful treatment decisions, our counterfactual algorithm identifies doses that are far less harmful without sacrificing efficacy. Our results show that counterfactual reasoning is a key ingredient for safe and ethical AI.  ( 2 min )
    Ollivier-Ricci Curvature For Head Pose Estimation From a Single Image. (arXiv:2204.13006v1 [cs.CV])
    Head pose estimation is a crucial challenge for many real-world applications, such as attention and human behavior analysis. This paper aims to estimate head pose from a single image by applying notions of network curvature. In the real world, many complex networks have groups of nodes that are well connected to each other with significant functional roles. Similarly, the interactions of facial landmarks can be represented as complex dynamic systems modeled by weighted graphs. The functionalities of such systems are therefore intrinsically linked to the topology and geometry of the underlying graph. In this work, using the geometric notion of Ollivier-Ricci curvature (ORC) on weighted graphs as input to the XGBoost regression model, we show that the intrinsic geometric basis of ORC offers a natural approach to discovering underlying common structure within a pool of poses. Experiments on the BIWI, AFLW2000 and Pointing'04 datasets show that the ORC_XGB method performs well compared to state-of-the-art methods, both landmark-based and image-only.
    Scalable particle-based alternatives to EM. (arXiv:2204.12965v1 [stat.CO])
    Building on (Neal and Hinton, 1998), where the problem tackled by EM is recast as the optimization of a free energy functional on an infinite-dimensional space, we obtain three practical particle-based alternatives to EM applicable to broad classes of models. All three are derived through straightforward discretizations of gradient flows associated with the functional. The novel algorithms scale well to high-dimensional settings and outperform existing state-of-the-art methods in numerical experiments.
    Epicardial Adipose Tissue Segmentation from CT Images with A Semi-3D Neural Network. (arXiv:2204.12904v1 [eess.IV])
    Epicardial adipose tissue is a type of adipose tissue located between the heart wall and a protective layer around the heart called the pericardium. The volume and thickness of epicardial adipose tissue are linked to various cardiovascular diseases. It is shown to be an independent cardiovascular disease risk factor. Fully automatic and reliable measurements of epicardial adipose tissue from CT scans could provide better disease risk assessment and enable the processing of large CT image data sets for a systemic epicardial adipose tissue study. This paper proposes a method for fully automatic semantic segmentation of epicardial adipose tissue from CT images using a deep neural network. The proposed network uses a U-Net-based architecture with slice depth information embedded in the input image to segment a pericardium region of interest, which is used to obtain an epicardial adipose tissue segmentation. Image augmentation is used to increase model robustness. Cross-validation of the proposed method yields a Dice score of 0.86 on the CT scans of 20 patients.  ( 2 min )
    On the Dynamics of Inference and Learning. (arXiv:2204.12939v1 [cond-mat.dis-nn])
    Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.  ( 2 min )
    Improving Feature Generalizability with Multitask Learning in Class Incremental Learning. (arXiv:2204.12915v1 [cs.LG])
    Many deep learning applications, like keyword spotting, require the incorporation of new concepts (classes) over time, referred to as Class Incremental Learning (CIL). The major challenge in CIL is catastrophic forgetting, i.e., preserving as much of the old knowledge as possible while learning new tasks. Various techniques, such as regularization, knowledge distillation, and the use of exemplars, have been proposed to resolve this issue. However, prior works primarily focus on the incremental learning step, while ignoring the optimization during the base model training. We hypothesize that a more transferable and generalizable feature representation from the base model would be beneficial to incremental learning. In this work, we adopt multitask learning during base model training to improve the feature generalizability. Specifically, instead of training a single model with all the base classes, we decompose the base classes into multiple subsets and regard each of them as a task. These tasks are trained concurrently and a shared feature extractor is obtained for incremental learning. We evaluate our approach on two datasets under various configurations. The results show that our approach enhances the average incremental learning accuracy by up to 5.5%, which enables more reliable and accurate keyword spotting over time. Moreover, the proposed approach can be combined with many existing techniques and provides additional performance gain.  ( 2 min )
    Forecasting Foreign Exchange Rates With Parameter-Free Regression Networks Tuned By Bayesian Optimization. (arXiv:2204.12914v1 [q-fin.ST])
    The article is concerned with the problem of multi-step financial time series forecasting of Foreign Exchange (FX) rates. To address this problem, we introduce a parameter-free regression network termed RegPred Net. The exchange rate to forecast is treated as a stochastic process. It is assumed to follow a generalization of Brownian motion and the mean-reverting process referred to as the generalized Ornstein-Uhlenbeck (OU) process, with time-dependent coefficients. Using past observed values of the input time series, these coefficients can be regressed online by the cells of the first half of the network (Reg). The regressed coefficients depend only on - but are very sensitive to - a small number of hyperparameters required to be set by a global optimization procedure for which, Bayesian optimization is an adequate heuristic. Thanks to its multi-layered architecture, the second half of the regression network (Pred) can project time-dependent values for the OU process coefficients and generate realistic trajectories of the time series. Predictions can be easily derived in the form of expected values estimated by averaging values obtained by Monte Carlo simulation. The forecasting accuracy on a 100 days horizon is evaluated for several of the most important FX rates such as EUR/USD, EUR/CNY, and EUR/GBP. Our experimental results show that the RegPred Net significantly outperforms ARMA, ARIMA, LSTMs, and Autoencoder-LSTM models in this task.  ( 2 min )
    LiftPool: Lifting-based Graph Pooling for Hierarchical Graph Representation Learning. (arXiv:2204.12881v1 [cs.LG])
    Graph pooling has been increasingly considered for graph neural networks (GNNs) to facilitate hierarchical graph representation learning. Existing graph pooling methods commonly consist of two stages, i.e., selecting the top-ranked nodes and removing the rest nodes to construct a coarsened graph representation. However, local structural information of the removed nodes would be inevitably dropped in these methods, due to the inherent coupling of nodes (location) and their features (signals). In this paper, we propose an enhanced three-stage method via lifting, named LiftPool, to improve hierarchical graph representation by maximally preserving the local structural information in graph pooling. LiftPool introduces an additional stage of graph lifting before graph coarsening to preserve the local information of the removed nodes and decouple the processes of node removing and feature reduction. Specifically, for each node to be removed, its local information is obtained by subtracting the global information aggregated from its neighboring preserved nodes. Subsequently, this local information is aligned and propagated to the preserved nodes to alleviate information loss in graph coarsening. Furthermore, we demonstrate that the proposed LiftPool is localized and permutation-invariant. The proposed graph lifting structure is general to be integrated with existing downsampling-based graph pooling methods. Evaluations on benchmark graph datasets show that LiftPool substantially outperforms the state-of-the-art graph pooling methods in the task of graph classification.  ( 2 min )
    Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study. (arXiv:2204.12868v1 [stat.ML])
    This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.  ( 2 min )
    Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering. (arXiv:2204.12848v1 [cs.LG])
    Predicitions made by neural networks can be fraudulently altered by so-called poisoning attacks. A special case are backdoor poisoning attacks. We study suitable detection methods and introduce a new method called Heatmap Clustering. There, we apply a $k$-means clustering algorithm on heatmaps produced by the state-of-the-art explainable AI method Layer-wise relevance propagation. The goal is to separate poisoned from un-poisoned data in the dataset. We compare this method with a similar method, called Activation Clustering, which also uses $k$-means clustering but applies it on the activation of certain hidden layers of the neural network as input. We test the performance of both approaches for standard backdoor poisoning attacks, label-consistent poisoning attacks and label-consistent poisoning attacks with reduced amplitude stickers. We show that Heatmap Clustering consistently performs better than Activation Clustering. However, when considering label-consistent poisoning attacks, the latter method also yields good detection performance.  ( 2 min )
    Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum Learning Study. (arXiv:2204.12844v1 [cs.RO])
    The Reinforcement Learning (RL) paradigm has been an essential tool for automating robotic tasks. Despite the advances in RL, it is still not widely adopted in the industry due to the need for an expensive large amount of robot interaction with its environment. Curriculum Learning (CL) has been proposed to expedite learning. However, most research works have been only evaluated in simulated environments, from video games to robotic toy tasks. This paper presents a study for accelerating robot learning of contact-rich manipulation tasks based on Curriculum Learning combined with Domain Randomization (DR). We tackle complex industrial assembly tasks with position-controlled robots, such as insertion tasks. We compare different curricula designs and sampling approaches for DR. Based on this study, we propose a method that significantly outperforms previous work, which uses DR only (No CL is used), with less than a fifth of the training time (samples). Results also show that even when training only in simulation with toy tasks, our method can learn policies that can be transferred to the real-world robot. The learned policies achieved success rates of up to 86\% on real-world complex industrial insertion tasks (with tolerances of $\pm 0.01~mm$) not seen during the training.  ( 2 min )
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v1 [cs.LG])
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods generally assume at least one of (i) source and target task label spaces must overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, these all assumptions are difficult to hold in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to licensing and storage costs, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with a synthesized dataset by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.  ( 2 min )
    Learning to Parallelize in a Shared-Memory Environment with Transformers. (arXiv:2204.12835v1 [cs.DC])
    In past years, the world has switched to many-core and multi-core shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes to software applications. OpenMP is the most comprehensive API that implements such schemes, characterized by a readable interface. Nevertheless, introducing OpenMP into code is challenging due to pervasive pitfalls in management of parallel shared memory. To facilitate the performance of this task, many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. In this work, we propose leveraging recent advances in ML techniques, specifically in natural language processing (NLP), to replace S2S compilers altogether. We create a database (corpus), Open-OMP, specifically for this goal. Open-OMP contains over 28,000 code snippets, half of which contain OpenMP directives while the other half do not need parallelization at all with high probability. We use the corpus to train systems to automatically classify code segments in need of parallelization, as well as suggest individual OpenMP clauses. We train several transformer models, named PragFormer, for these tasks, and show that they outperform statistically-trained baselines and automatic S2S parallelization compilers in both classifying the overall need for an OpenMP directive and the introduction of private and reduction clauses. Our source code and database are available at: https://github.com/Scientific-Computing-Lab-NRCN/PragFormer.  ( 2 min )
    Supervised Contrastive CSI Representation Learning for Massive MIMO Positioning. (arXiv:2204.12796v1 [cs.IT])
    Similarity metric is crucial for massive MIMO positioning utilizing channel state information~(CSI). In this letter, we propose a novel massive MIMO CSI similarity learning method via deep convolutional neural network~(DCNN) and contrastive learning. A contrastive loss function is designed considering multiple positive and negative CSI samples drawn from a training dataset. The DCNN encoder is trained using the loss so that positive samples are mapped to points close to the anchor's encoding, while encodings of negative samples are kept away from the anchor's in the representation space. Evaluation results of fingerprint-based positioning on a real-world CSI dataset show that the learned similarity metric improves positioning accuracy significantly compared with other known state-of-the-art methods.  ( 2 min )
    GTNet: A Tree-Based Deep Graph Learning Architecture. (arXiv:2204.12802v1 [cs.LG])
    We propose Graph Tree Networks (GTNets), a deep graph learning architecture with a new general message passing scheme that originates from the tree representation of graphs. In the tree representation, messages propagate upward from the leaf nodes to the root node, and each node preserves its initial information prior to receiving information from its child nodes (neighbors). We formulate a general propagation rule following the nature of message passing in the tree to update a node's feature by aggregating its initial feature and its neighbor nodes' updated features. Two graph representation learning models are proposed within this GTNet architecture - Graph Tree Attention Network (GTAN) and Graph Tree Convolution Network (GTCN), with experimentally demonstrated state-of-the-art performance on several popular benchmark datasets. Unlike the vanilla Graph Attention Network (GAT) and Graph Convolution Network (GCN) which have the "over-smoothing" issue, the proposed GTAN and GTCN models can go deep as demonstrated by comprehensive experiments and rigorous theoretical analysis.  ( 2 min )
    When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support. (arXiv:2204.12810v1 [cs.LG])
    Scientific publications about machine learning in healthcare are often about implementing novel methods and boosting the performance - at least from a computer science perspective. However, beyond such often short-lived improvements, much more needs to be taken into consideration if we want to arrive at a sustainable progress in healthcare. What does it take to actually implement such a system, make it usable for the domain expert, and possibly bring it into practical usage? Targeted at Computer Scientists, this work presents a multidisciplinary view on machine learning in medical decision support systems and covers information technology, medical, as well as ethical aspects. Along with an implemented risk prediction system in nephrology, challenges and lessons learned in a pilot project are presented.  ( 2 min )
    Learning Green's functions associated with parabolic partial differential equations. (arXiv:2204.12789v1 [math.NA])
    Given input-output pairs from a parabolic partial differential equation (PDE) in any spatial dimension $n\geq 1$, we derive the first theoretically rigorous scheme for learning the associated Green's function $G$. Until now, rigorously learning Green's functions associated with parabolic operators has been a major challenge in the field of scientific machine learning because $G$ may not be square-integrable when $n>1$, and time-dependent PDEs have transient dynamics. By combining the hierarchical low-rank structure of $G$ together with the randomized singular value decomposition, we construct an approximant to $G$ that achieves a relative error of $\smash{\mathcal{O}(\Gamma_\epsilon^{-1/2}\epsilon)}$ in the $L^1$-norm with high probability by using at most $\smash{\mathcal{O}(\epsilon^{-\frac{n+2}{2}}\log(1/\epsilon))}$ input-output training pairs, where $\Gamma_\epsilon$ is a measure of the quality of the training dataset for learning $G$, and $\epsilon>0$ is sufficiently small. Along the way, we extend the low-rank theory of Bebendorf and Hackbusch from elliptic PDEs in dimension $1\leq n\leq 3$ to parabolic PDEs in any dimensions, which shows that Green's functions associated with parabolic PDEs admit a low-rank structure on well-separated domains.  ( 2 min )
    SVD Perspectives for Augmenting DeepONet Flexibility and Interpretability. (arXiv:2204.12670v1 [cs.LG])
    Deep operator networks (DeepONets) are powerful architectures for fast and accurate emulation of complex dynamics. As their remarkable generalization capabilities are primarily enabled by their projection-based attribute, we investigate connections with low-rank techniques derived from the singular value decomposition (SVD). We demonstrate that some of the concepts behind proper orthogonal decomposition (POD)-neural networks can improve DeepONet's design and training phases. These ideas lead us to a methodology extension that we name SVD-DeepONet. Moreover, through multiple SVD analyses, we find that DeepONet inherits from its projection-based attribute strong inefficiencies in representing dynamics characterized by symmetries. Inspired by the work on shifted-POD, we develop flexDeepONet, an architecture enhancement that relies on a pre-transformation network for generating a moving reference frame and isolating the rigid components of the dynamics. In this way, the physics can be represented on a latent space free from rotations, translations, and stretches, and an accurate projection can be performed to a low-dimensional basis. In addition to flexibility and interpretability, the proposed perspectives increase DeepONet's generalization capabilities and computational efficiencies. For instance, we show flexDeepONet can accurately surrogate the dynamics of 19 variables in a combustion chemistry application by relying on 95% less trainable parameters than the ones of the vanilla architecture. We argue that DeepONet and SVD-based methods can reciprocally benefit from each other. In particular, the flexibility of the former in leveraging multiple data sources and multifidelity knowledge in the form of both unstructured data and physics-informed constraints has the potential to greatly extend the applicability of methodologies such as POD and PCA.  ( 2 min )
    Accelerated Continuous-Time Approximate Dynamic Programming via Data-Assisted Hybrid Control. (arXiv:2204.12707v1 [math.OC])
    We introduce a new closed-loop architecture for the online solution of approximate optimal control problems in the context of continuous-time systems. Specifically, we introduce the first algorithm that incorporates dynamic momentum in actor-critic structures to control continuous-time dynamic plants with an affine structure in the input. By incorporating dynamic momentum in our algorithm, we are able to accelerate the convergence properties of the closed-loop system, achieving superior transient performance compared to traditional gradient-descent based techniques. In addition, by leveraging the existence of past recorded data with sufficiently rich information properties, we dispense with the persistence of excitation condition traditionally imposed on the regressors of the critic and the actor. Given that our continuous-time momentum-based dynamics also incorporate periodic discrete-time resets that emulate restarting techniques used in the machine learning literature, we leverage tools from hybrid dynamical systems theory to establish asymptotic stability properties for the closed-loop system. We illustrate our results with a numerical example.  ( 2 min )
    Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training. (arXiv:2204.12729v1 [cs.CV])
    Recently, much progress has been made for self-supervised action recognition. Most existing approaches emphasize the contrastive relations among videos, including appearance and motion consistency. However, two main issues remain for existing pre-training methods: 1) the learned representation is neutral and not informative for a specific task; 2) multi-task learning-based pre-training sometimes leads to sub-optimal solutions due to inconsistent domains of different tasks. To address the above issues, we propose a novel action recognition pre-training framework, which exploits human-centered prior knowledge that generates more informative representation, and avoids the conflict between multiple tasks by using task-dependent representations. Specifically, we distill knowledge from a human parsing model to enrich the semantic capability of representation. In addition, we combine knowledge distillation with contrastive learning to constitute a task-dependent multi-task framework. We achieve state-of-the-art performance on two popular benchmarks for action recognition task, i.e., UCF101 and HMDB51, verifying the effectiveness of our method.  ( 2 min )
    The Multimarginal Optimal Transport Formulation of Adversarial Multiclass Classification. (arXiv:2204.12676v1 [cs.LG])
    We study a family of adversarial multiclass classification problems and provide equivalent reformulations in terms of: 1) a family of generalized barycenter problems introduced in the paper and 2) a family of multimarginal optimal transport problems where the number of marginals is equal to the number of classes in the original classification problem. These new theoretical results reveal a rich geometric structure of adversarial learning problems in multiclass classification and extend recent results restricted to the binary classification setting. A direct computational implication of our results is that by solving either the barycenter problem and its dual, or the MOT problem and its dual, we can recover the optimal robust classification rule and the optimal adversarial strategy for the original adversarial problem. Examples with synthetic and real data illustrate our results.  ( 2 min )
    A Multi-Head Convolutional Neural Network With Multi-path Attention improves Image Denoising. (arXiv:2204.12736v1 [cs.CV])
    Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input images rotated by different rotation angles. MH makes MHCNN simultaneously utilize features of rotated images to remove noise. We also present a novel multi-path attention mechanism (MPA) to integrate these features effectively. Unlike previous attention mechanisms that handle pixel-level, channel-level, and patch-level features, MPA focuses on features at the image level. Experiments show MHCNN surpasses other state-of-the-art CNN models on additive white Gaussian noise (AWGN) denoising and real-world image denoising. Its peak signal-to-noise ratio (PSNR) results are higher than other networks, such as DnCNN, BRDNet, RIDNet, PAN-Net, and CSANN. It is also demonstrated that the proposed MH with MPA mechanism can be used as a pluggable component.  ( 2 min )
    DraftRec: Personalized Draft Recommendation for Winning in Multi-Player Online Battle Arena Games. (arXiv:2204.12750v1 [cs.AI])
    This paper presents a personalized character recommendation system for Multiplayer Online Battle Arena (MOBA) games which are considered as one of the most popular online video game genres around the world. When playing MOBA games, players go through a draft stage, where they alternately select a virtual character to play. When drafting, players select characters by not only considering their character preferences, but also the synergy and competence of their team's character combination. However, the complexity of drafting induces difficulties for beginners to choose the appropriate characters based on the characters of their team while considering their own champion preferences. To alleviate this problem, we propose DraftRec, a novel hierarchical model which recommends characters by considering each player's champion preferences and the interaction between the players. DraftRec consists of two networks: the player network and the match network. The player network captures the individual player's champion preference, and the match network integrates the complex relationship between the players and their respective champions. We train and evaluate our model from a manually collected 280,000 matches of League of Legends and a publicly available 50,000 matches of Dota2. Empirically, our method achieved state-of-the-art performance in character recommendation and match outcome prediction task. Furthermore, a comprehensive user survey confirms that DraftRec provides convincing and satisfying recommendations. Our code and dataset are available at https://github.com/dojeon-ai/DraftRec.  ( 2 min )
    Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning. (arXiv:2204.12703v1 [cs.LG])
    Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server. Unlike in conventional ensemble learning, in FL the ensemble can be trained on clients' highly heterogeneous data. Cognizant of this property, Fed-ET uses a weighted consensus distillation scheme with diversity regularization that efficiently extracts reliable consensus from the ensemble while improving generalization by exploiting the diversity within the ensemble. We show the generalization bound for the ensemble of weighted models trained on heterogeneous datasets that supports the intuition of Fed-ET. Our experiments on image and language tasks show that Fed-ET significantly outperforms other state-of-the-art FL algorithms with fewer communicated parameters, and is also robust against high data-heterogeneity.  ( 2 min )
    Data-based price discrimination: information theoretic limitations and a minimax optimal strategy. (arXiv:2204.12723v1 [cs.GT])
    This paper studies the gap between the classical pricing theory and the data-based pricing theory. We focus on the problem of price discrimination with a continuum of buyer types based on a finite sample of observations. Our first set of results provides sharp lower bounds in the worst-case scenario for the discrepancy between any data-based pricing strategies and the theoretical optimal third-degree price discrimination (3PD) strategy (respectively, uniform pricing strategy) derived from the distribution (where the sample is drawn) ranging over a large class of distributions. Consequently, there is an inevitable gap between revenues based on any data-based pricing strategy and the revenue based on the theoretical optimal 3PD (respectively, uniform pricing) strategy. We then propose easy-to-implement data-based 3PD and uniform pricing strategies and show each strategy is minimax optimal in the sense that the gap between their respective revenue and the revenue based on the theoretical optimal 3PD (respectively, uniform pricing) strategy matches our worst-case lower bounds up to constant factors (that are independent of the sample size $n$). We show that 3PD strategies are revenue superior to uniform pricing strategies if and only if the sample size $n$ is large enough. In other words, if $n$ is below a threshold, uniform pricing strategies are revenue superior to 3PD strategies. We further provide upper bounds for the gaps between the welfare generated by our minimax optimal 3PD (respectively, uniform pricing) strategy and the welfare based on the theoretical optimal 3PD (respectively, uniform pricing) strategy.  ( 2 min )
    Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems. (arXiv:2204.12665v1 [cs.LG])
    Reinforcement learning in problems with symbolic state spaces is challenging due to the need for reasoning over long horizons. This paper presents a new approach that utilizes relational abstractions in conjunction with deep learning to learn a generalizable Q-function for such problems. The learned Q-function can be efficiently transferred to related problems that have different object names and object quantities, and thus, entirely different state spaces. We show that the learned generalized Q-function can be utilized for zero-shot transfer to related problems without an explicit, hand-coded curriculum. Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects.  ( 2 min )
    Machines of finite depth: towards a formalization of neural networks. (arXiv:2204.12786v1 [cs.LG])
    We provide a unifying framework where artificial neural networks and their architectures can be formally described as particular cases of a general mathematical construction--machines of finite depth. Unlike neural networks, machines have a precise definition, from which several properties follow naturally. Machines of finite depth are modular (they can be combined), efficiently computable and differentiable. The backward pass of a machine is again a machine and can be computed without overhead using the same procedure as the forward pass. We prove this statement theoretically and practically, via a unified implementation that generalizes several classical architectures--dense, convolutional, and recurrent neural networks with a rich shortcut structure--and their respective backpropagation rules.  ( 2 min )
    Understanding A Class of Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective. (arXiv:2204.12663v1 [cs.LG])
    Distributed algorithms have been playing an increasingly important role in many applications such as machine learning, signal processing, and control. Significant research efforts have been devoted to developing and analyzing new algorithms for various applications. In this work, we provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. Through the lens of multi-rate feedback control, we show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system, possibly with multiple sampling rates, such as decentralized gradient descent, gradient tracking, and federated averaging. This key observation not only allows us to develop a generic framework to analyze the convergence of the entire algorithm class. More importantly, it also leads to an interesting way of designing new distributed algorithms. We develop the theory behind our framework and provide examples to highlight how the framework can be used in practice.  ( 2 min )
    Gaussian Kernel Variance For an Adaptive Learning Method on Signals Over Graphs. (arXiv:2204.12629v1 [eess.SP])
    This paper discusses a special kind of a simple yet possibly powerful algorithm, called single-kernel Gradraker (SKG), which is an adaptive learning method predicting unknown nodal values in a network using known nodal values and the network structure. We aim to find out how to configure the special kind of the model in applying the algorithm. To be more specific, we focus on SKG with a Gaussian kernel and specify how to find a suitable variance for the kernel. To do so, we introduce two variables with which we are able to set up requirements on the variance of the Gaussian kernel to achieve (near-) optimal performance and can better understand how SKG works. Our contribution is that we introduce two variables as analysis tools, illustrate how predictions will be affected under different Gaussian kernels, and provide an algorithm finding a suitable Gaussian kernel for SKG with knowledge about the training network. Simulation results on real datasets are provided.  ( 2 min )
    Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback. (arXiv:2204.12764v1 [cs.LG])
    We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest $d$ rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs $\Omega(T)$ pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys $o(T)$ policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for $K$-armed bandit and bandit convex optimization, we have $\mathcal{O}(T^{2/3})$ policy regret bound. We also prove a matching lower bound for $K$-armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is non-oblivious. It answers the open problem proposed in \cite{wang2021adaptive}, showing that non-oblivious delay is enough to incur $\tilde{\Omega}(T^{2/3})$ regret.  ( 2 min )
    SCGC : Self-Supervised Contrastive Graph Clustering. (arXiv:2204.12656v1 [cs.LG])
    Graph clustering discovers groups or communities within networks. Deep learning methods such as autoencoders (AE) extract effective clustering and downstream representations but cannot incorporate rich structural information. While Graph Neural Networks (GNN) have shown great success in encoding graph structure, typical GNNs based on convolution or attention variants suffer from over-smoothing, noise, heterophily, are computationally expensive and typically require the complete graph being present. Instead, we propose Self-Supervised Contrastive Graph Clustering (SCGC), which imposes graph-structure via contrastive loss signals to learn discriminative node representations and iteratively refined soft cluster labels. We also propose SCGC*, with a more effective, novel, Influence Augmented Contrastive (IAC) loss to fuse richer structural information, and half the original model parameters. SCGC(*) is faster with simple linear units, completely eliminate convolutions and attention of traditional GNNs, yet efficiently incorporates structure. It is impervious to layer depth and robust to over-smoothing, incorrect edges and heterophily. It is scalable by batching, a limitation in many prior GNN models, and trivially parallelizable. We obtain significant improvements over state-of-the-art on a wide range of benchmark graph datasets, including images, sensor data, text, and citation networks efficiently. Specifically, 20% on ARI and 18% on NMI for DBLP; overall 55% reduction in training time and overall, 81% reduction on inference time. Our code is available at : https://github.com/gayanku/SCGC  ( 2 min )
    Evaluation of Self-taught Learning-based Representations for Facial Emotion Recognition. (arXiv:2204.12624v1 [cs.CV])
    This work describes different strategies to generate unsupervised representations obtained through the concept of self-taught learning for facial emotion recognition (FER). The idea is to create complementary representations promoting diversity by varying the autoencoders' initialization, architecture, and training data. SVM, Bagging, Random Forest, and a dynamic ensemble selection method are evaluated as final classification methods. Experimental results on Jaffe and Cohn-Kanade datasets using a leave-one-subject-out protocol show that FER methods based on the proposed diverse representations compare favorably against state-of-the-art approaches that also explore unsupervised feature learning.  ( 2 min )
    Meta-Learning Based Early Fault Detection for Rolling Bearings via Few-Shot Anomaly Detection. (arXiv:2204.12637v1 [cs.LG])
    Early fault detection (EFD) of rolling bearings can recognize slight deviation of the health states and contribute to the stability of mechanical systems. In practice, very limited target bearing data are available to conduct EFD, which makes it hard to adapt to the EFD task of new bearings. To address this problem, many transfer learning based EFD methods utilize historical data to learn transferable domain knowledge and conduct early fault detection on new target bearings. However, most existing methods only consider the distribution drift across different working conditions but ignore the difference between bearings under the same working condition, which is called Unit-to-Unit Variability (UtUV). The setting of EFD with limited target data considering UtUV can be formulated as a Few-shot Anomaly Detection task. Therefore, this paper proposes a novel EFD method based on meta-learning considering UtUV. The proposed method can learn a generic metric based on Relation Network (RN) to measure the similarity between normal data and the new arrival target bearing data. Besides, the proposed method utilizes a health state embedding strategy to decrease false alarms. The performance of proposed method is tested on two bearing datasets. The results show that the proposed method can detect incipient faults earlier than the baselines with lower false alarms.  ( 2 min )
    Novel Applications for VAE-based Anomaly Detection Systems. (arXiv:2204.12577v1 [cs.LG])
    The recent rise in deep learning technologies fueled innovation and boosted scientific research. Their achievements enabled new research directions for deep generative modeling (DGM), an increasingly popular approach that can create novel and unseen data, starting from a given data set. As the technology shows promising applications, many ethical issues also arise. For example, their misuse can enable disinformation campaigns and powerful phishing attempts. Research also indicates different biases affect deep learning models, leading to social issues such as misrepresentation. In this work, we formulate a novel setting to deal with similar problems, showing that a repurposed anomaly detection system effectively generates novel data, avoiding generating specified unwanted data. We propose Variational Auto-encoding Binary Classifiers (V-ABC): a novel model that repurposes and extends the Auto-encoding Binary Classifier (ABC) anomaly detector, using the Variational Auto-encoder (VAE). We survey the limitations of existing approaches and explore many tools to show the model's inner workings in an interpretable way. This proposal has excellent potential for generative applications: models that rely on user-generated data could automatically filter out unwanted content, such as offensive language, obscene images, and misleading information.  ( 2 min )
    Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks. (arXiv:2204.12589v1 [cs.LG])
    Physics-informed Neural Networks (PINNs) are gaining attention in the engineering and scientific literature for solving a range of differential equations with applications in weather modeling, healthcare, manufacturing, and so on. Poor scalability is one of the barriers to utilizing PINNs for many real-world problems. To address this, a Self-scalable tanh (Stan) activation function is proposed for the PINNs. The proposed Stan function is smooth, non-saturating, and has a trainable parameter. During training, it can allow easy flow of gradients to compute the required derivatives and also enable systematic scaling of the input-output mapping. It is also shown theoretically that the PINN with the proposed Stan function has no spurious stationary points when using gradient descent algorithms. The proposed Stan is tested on a couple of numerical studies involving general regression problems. It is subsequently used for solving multiple forward problems, which involve second-order derivatives and multiple dimensions, and an inverse problem where the thermal diffusivity is predicted through heat conduction in a rod. Our results of these case studies establish empirically that the Stan activation function can achieve better training and more accurate predictions than the state-of-the-art activation functions.  ( 2 min )
    Rate-Constrained Remote Contextual Bandits. (arXiv:2204.12620v1 [cs.LG])
    We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem. However, the contexts are observed by a remotely connected entity, i.e., the decision-maker, that updates the policy to maximize the returned rewards, and communicates the arms to be sampled by the agents to a controller over a rate-limited communications channel. This framework can be applied to personalized ad placement, whenever the content owner observes the website visitors, and hence has the context, but needs to transmit the ads to be shown to a controller that is in charge of placing the marketing content. Consequently, the rate-constrained CMAB (RC-CMAB) problem requires the study of lossy compression schemes for the policy to be employed whenever the constraint on the channel rate does not allow the uncompressed transmission of the decision-maker's intentions. We characterize the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret that can be achieved, identifying the two distinct rate regions leading to linear and sub-linear regrets respectively. We then analyze the optimal compression scheme achievable in the limit with infinite agents, when using the forward and reverse KL divergence as distortion metric. Based on this, we also propose a practical coding scheme, and provide numerical results.  ( 2 min )
    RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning. (arXiv:2204.12581v1 [cs.LG])
    Offline reinforcement learning (RL) aims to find near-optimal policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. To achieve conservatism, we formulate the problem as a two-player zero sum game against an adversarial environment model. The model is trained minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and optimising the model adversarially. The problem formulation that we address is theoretically grounded, resulting in a PAC performance guarantee and a pessimistic value function which lower bounds the value function in the true environment. We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that our approach achieves state of the art performance.  ( 2 min )
    hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification. (arXiv:2204.12587v1 [cs.MM])
    Social media platforms often act as breeding grounds for various forms of trolling or malicious content targeting users or communities. One way of trolling users is by creating memes, which in most cases unites an image with a short piece of text embedded on top of it. The situation is more complex for multilingual(e.g., Tamil) memes due to the lack of benchmark datasets and models. We explore several models to detect Troll memes in Tamil based on the shared task, "Troll Meme Classification in DravidianLangTech2022" at ACL-2022. We observe while the text-based model MURIL performs better for Non-troll meme classification, the image-based model VGG16 performs better for Troll-meme classification. Further fusing these two modalities help us achieve stable outcomes in both classes. Our fusion model achieved a 0.561 weighted average F1 score and ranked second in this task.  ( 2 min )
    Protein 3D structure-based neural networks highly improve the accuracy in compound-protein binding affinity prediction. (arXiv:2204.12586v1 [q-bio.BM])
    Theoretically, the accuracy of computational models in predicting compound-protein binding affinities (CPAs) could be improved by the introduction of protein 3D structure information. However, most of these models still suffer from a low accuracy due to the lack of an efficient approach to encode informative protein features. The major challenge is how to combine the multi-modal information such as the residue sequence of the protein, residue atom coordinates and the torsion angles. To tackle this problem, we develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting CPAs. Specifically, we established a novel end-to-end architecture to jointly embed torsion matrix, discrete distance matrix, and sequence information of protein and extract compound features with deep graph convolution layers. In addition, a new pairwise mapping attention mechanism is introduced to comprehensively learn potential interaction information between proteins and compounds. FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the Pearson value elevated by about 35.7%. Thus, FeatNN provides an outstanding method for highly accurate CPA prediction and facilitates high-throughput virtual screening of drug candidates.  ( 2 min )
    Learning Eco-Driving Strategies at Signalized Intersections. (arXiv:2204.12561v1 [eess.SY])
    Signalized intersections in arterial roads result in persistent vehicle idling and excess accelerations, contributing to fuel consumption and CO2 emissions. There has thus been a line of work studying eco-driving control strategies to reduce fuel consumption and emission levels at intersections. However, methods to devise effective control strategies across a variety of traffic settings remain elusive. In this paper, we propose a reinforcement learning (RL) approach to learn effective eco-driving control strategies. We analyze the potential impact of a learned strategy on fuel consumption, CO2 emission, and travel time and compare with naturalistic driving and model-based baselines. We further demonstrate the generalizability of the learned policies under mixed traffic scenarios. Simulation results indicate that scenarios with 100% penetration of connected autonomous vehicles (CAV) may yield as high as 18% reduction in fuel consumption and 25% reduction in CO2 emission levels while even improving travel speed by 20%. Furthermore, results indicate that even 25% CAV penetration can bring at least 50% of the total fuel and emission reduction benefits.  ( 2 min )
    SoFaiR: Single Shot Fair Representation Learning. (arXiv:2204.12556v1 [cs.LG])
    To avoid discriminatory uses of their data, organizations can learn to map them into a representation that filters out information related to sensitive attributes. However, all existing methods in fair representation learning generate a fairness-information trade-off. To achieve different points on the fairness-information plane, one must train different models. In this paper, we first demonstrate that fairness-information trade-offs are fully characterized by rate-distortion trade-offs. Then, we use this key result and propose SoFaiR, a single shot fair representation learning method that generates with one trained model many points on the fairness-information plane. Besides its computational saving, our single-shot approach is, to the extent of our knowledge, the first fair representation learning method that explains what information is affected by changes in the fairness / distortion properties of the representation. Empirically, we find on three datasets that SoFaiR achieves similar fairness-information trade-offs as its multi-shot counterparts.  ( 2 min )
    Surrogate Assisted Evolutionary Multi-objective Optimisation applied to a Pressure Swing Adsorption system. (arXiv:2204.12585v1 [cs.NE])
    Chemical plant design and optimisation have proven challenging due to the complexity of these real-world systems. The resulting complexity translates into high computational costs for these systems' mathematical formulations and simulation models. Research has illustrated the benefits of using machine learning surrogate models as substitutes for computationally expensive models during optimisation. This paper extends recent research into optimising chemical plant design and operation. The study further explores Surrogate Assisted Genetic Algorithms (SA-GA) in more complex variants of the original plant design and optimisation problems, such as the inclusion of parallel and feedback components. The novel extension to the original algorithm proposed in this study, Surrogate Assisted NSGA-\Romannum{2} (SA-NSGA), was tested on a popular literature case, the Pressure Swing Adsorption (PSA) system. We further provide extensive experimentation, comparing various meta-heuristic optimisation techniques and numerous machine learning models as surrogates. The results for both sets of systems illustrate the benefits of using Genetic Algorithms as an optimisation framework for complex chemical plant system design and optimisation for both single and multi-objective scenarios. We confirm that Random Forest surrogate assisted Evolutionary Algorithms can be scaled to increasingly complex chemical systems with parallel and feedback components. We further find that combining a Genetic Algorithm framework with Machine Learning Surrogate models as a substitute for long-running simulation models yields significant computational efficiency improvements, 1.7 - 1.84 times speedup for the increased complexity examples and a 2.7 times speedup for the Pressure Swing Adsorption system.  ( 2 min )
    Toward Policy Explanations for Multi-Agent Reinforcement Learning. (arXiv:2204.12568v1 [cs.AI])
    Advances in multi-agent reinforcement learning(MARL) enable sequential decision making for a range of exciting multi-agent applications such as cooperative AI and autonomous driving. Explaining agent decisions are crucial for improving system transparency, increasing user satisfaction, and facilitating human-agent collaboration. However, existing works on explainable reinforcement learning mostly focus on the single-agent setting and are not suitable for addressing challenges posed by multi-agent environments. We present novel methods to generate two types of policy explanations for MARL: (i) policy summarization about the agent cooperation and task sequence, and (ii) language explanations to answer queries about agent behavior. Experimental results on three MARL domains demonstrate the scalability of our methods. A user study shows that the generated explanations significantly improve user performance and increase subjective ratings on metrics such as user satisfaction.  ( 2 min )
    Process Knowledge-infused Learning for Suicidality Assessment on Social Media. (arXiv:2204.12560v1 [cs.AI])
    Improving the performance and natural language explanations of deep learning algorithms is a priority for adoption by humans in the real world. In several domains, such as healthcare, such technology has significant potential to reduce the burden on humans by providing quality assistance at scale. However, current methods rely on the traditional pipeline of predicting labels from data, thus completely ignoring the process and guidelines used to obtain the labels. Furthermore, post hoc explanations on the data to label prediction using explainable AI (XAI) models, while satisfactory to computer scientists, leave much to be desired to the end-users due to lacking explanations of the process in terms of human-understandable concepts. We \textit{introduce}, \textit{formalize}, and \textit{develop} a novel Artificial Intelligence (A) paradigm -- Process Knowledge-infused Learning (PK-iL). PK-iL utilizes a structured process knowledge that explicitly explains the underlying prediction process that makes sense to end-users. The qualitative human evaluation confirms through a annotator agreement of 0.72, that humans are understand explanations for the predictions. PK-iL also performs competitively with the state-of-the-art (SOTA) baselines.  ( 2 min )
    Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages. (arXiv:2204.12543v1 [cs.CL])
    Abusive language is a growing concern in many social media platforms. Repeated exposure to abusive speech has created physiological effects on the target users. Thus, the problem of abusive language should be addressed in all forms for online peace and safety. While extensive research exists in abusive speech detection, most studies focus on English. Recently, many smearing incidents have occurred in India, which provoked diverse forms of abusive speech in online space in various languages based on the geographic location. Therefore it is essential to deal with such malicious content. In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages. We examine different interlingual transfer mechanisms and observe the performance of various multilingual models for abusive speech detection for eight different Indic languages. We also experiment to show how robust these models are on adversarial attacks. Finally, we conduct an in-depth error analysis by looking into the models' misclassified posts across various settings. We have made our code and models public for other researchers.  ( 2 min )
    Double Diffusion Maps and their Latent Harmonics for Scientific Computations in Latent Space. (arXiv:2204.12536v1 [stat.ML])
    We introduce a data-driven approach to building reduced dynamical models through manifold learning; the reduced latent space is discovered using Diffusion Maps (a manifold learning technique) on time series data. A second round of Diffusion Maps on those latent coordinates allows the approximation of the reduced dynamical models. This second round enables mapping the latent space coordinates back to the full ambient space (what is called lifting); it also enables the approximation of full state functions of interest in terms of the reduced coordinates. In our work, we develop and test three different reduced numerical simulation methodologies, either through pre-tabulation in the latent space and integration on the fly or by going back and forth between the ambient space and the latent space. The data-driven latent space simulation results, based on the three different approaches, are validated through (a) the latent space observation of the full simulation through the Nystr\"om Extension formula, or through (b) lifting the reduced trajectory back to the full ambient space, via Latent Harmonics. Latent space modeling often involves additional regularization to favor certain properties of the space over others, and the mapping back to the ambient space is then constructed mostly independently from these properties; here, we use the same data-driven approach to construct the latent space and then map back to the ambient space.  ( 2 min )
    Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production. (arXiv:2204.12526v1 [q-bio.QM])
    In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.  ( 2 min )
    Multi stain graph fusion for multimodal integration in pathology. (arXiv:2204.12541v1 [eess.IV])
    In pathology, tissue samples are assessed using multiple staining techniques to enhance contrast in unique histologic features. In this paper, we introduce a multimodal CNN-GNN based graph fusion approach that leverages complementary information from multiple non-registered histopathology images to predict pathologic scores. We demonstrate this approach in nonalcoholic steatohepatitis (NASH) by predicting CRN fibrosis stage and NAFLD Activity Score (NAS). Primary assessment of NASH typically requires liver biopsy evaluation on two histological stains: Trichrome (TC) and hematoxylin and eosin (H&E). Our multimodal approach learns to extract complementary information from TC and H&E graphs corresponding to each stain while simultaneously learning an optimal policy to combine this information. We report up to 20% improvement in predicting fibrosis stage and NAS component grades over single-stain modeling approaches, measured by computing linearly weighted Cohen's kappa between machine-derived vs. pathologist consensus scores. Broadly, this paper demonstrates the value of leveraging diverse pathology images for improved ML-powered histologic assessment.  ( 2 min )
    Application of WGAN-GP in recommendation and Questioning the relevance of GAN-based approaches. (arXiv:2204.12527v1 [cs.IR])
    Many neural-based recommender systems were proposed in recent years and part of them used Generative Adversarial Networks (GAN) to model user-item interactions. However, the exploration of Wasserstein GAN with Gradient Penalty (WGAN-GP) on recommendation has received relatively less scrutiny. In this paper, we focus on two questions: 1- Can we successfully apply WGAN-GP on recommendation and does this approach give an advantage compared to the best GAN models? 2- Are GAN-based recommender systems relevant? To answer the first question, we propose a recommender system based on WGAN-GP called CFWGAN-GP which is founded on a previous model (CFGAN). We successfully applied our method on real-world datasets on the top-k recommendation task and the empirical results show that it is competitive with state-of-the-art GAN approaches, but we found no evidence of significant advantage of using WGAN-GP instead of the original GAN, at least from the accuracy point of view. As for the second question, we conduct a simple experiment in which we show that a well-tuned conceptually simpler method outperforms GAN-based models by a considerable margin, questioning the use of such models.  ( 2 min )
    Self-Supervised Information Bottleneck for Deep Multi-View Subspace Clustering. (arXiv:2204.12496v1 [cs.LG])
    In this paper, we explore the problem of deep multi-view subspace clustering framework from an information-theoretic point of view. We extend the traditional information bottleneck principle to learn common information among different views in a self-supervised manner, and accordingly establish a new framework called Self-supervised Information Bottleneck based Multi-view Subspace Clustering (SIB-MSC). Inheriting the advantages from information bottleneck, SIB-MSC can learn a latent space for each view to capture common information among the latent representations of different views by removing superfluous information from the view itself while retaining sufficient information for the latent representations of other views. Actually, the latent representation of each view provides a kind of self-supervised signal for training the latent representations of other views. Moreover, SIB-MSC attempts to learn the other latent space for each view to capture the view-specific information by introducing mutual information based regularization terms, so as to further improve the performance of multi-view subspace clustering. To the best of our knowledge, this is the first work to explore information bottleneck for multi-view subspace clustering. Extensive experiments on real-world multi-view data demonstrate that our method achieves superior performance over the related state-of-the-art methods.  ( 2 min )
    Enhancing Privacy against Inversion Attacks in Federated Learning by using Mixing Gradients Strategies. (arXiv:2204.12495v1 [cs.LG])
    Federated learning reduces the risk of information leakage, but remains vulnerable to attacks. We investigate how several neural network design decisions can defend against gradients inversion attacks. We show that overlapping gradients provides numerical resistance to gradient inversion on the highly vulnerable dense layer. Specifically, we propose to leverage batching to maximise mixing of gradients by choosing an appropriate loss function and drawing identical labels. We show that otherwise it is possible to directly recover all vectors in a mini-batch without any numerical optimisation due to the de-mixing nature of the cross entropy loss. To accurately assess data recovery, we introduce an absolute variation distance (AVD) metric for information leakage in images, derived from total variation. In contrast to standard metrics, e.g. Mean Squared Error or Structural Similarity Index, AVD offers a continuous metric for extracting information in noisy images. Finally, our empirical results on information recovery from various inversion attacks and training performance supports our defense strategies. These strategies are also shown to be useful for deep convolutional neural networks such as LeNET for image recognition. We hope that this study will help guide the development of further strategies that achieve a trustful federation policy.  ( 2 min )
    AI-Assisted Authentication: State of the Art, Taxonomy and Future Roadmap. (arXiv:2204.12492v1 [cs.CR])
    Artificial Intelligence (AI) has found its applications in a variety of environments ranging from data science to cybersecurity. AI helps break through the limitations of traditional algorithms and provides more efficient and flexible methods for solving problems. In this paper, we focus on the applications of artificial intelligence in authentication, which is used in a wide range of scenarios including facial recognition to access buildings, keystroke dynamics to unlock smartphones. With the emerging AI-assisted authentication schemes, our comprehensive survey provides an overall understanding on a high level, which paves the way for future research in this area. In contrast to other relevant surveys, our research is the first of its kind to focus on the roles of AI in authentication.  ( 2 min )
    One-shot Federated Learning without Server-side Training. (arXiv:2204.12493v1 [cs.LG])
    Federated Learning (FL) has recently made significant progress as a new machine learning paradigm for privacy protection. Due to the high communication cost of traditional FL, one-shot federated learning is gaining popularity as a way to reduce communication cost between clients and the server. Most of the existing one-shot FL methods are based on Knowledge Distillation; however, distillation based approach requires an extra training phase and depends on publicly available data sets. In this work, we consider a novel and challenging setting: performing a single round of parameter aggregation on the local models without server-side training on a public data set. In this new setting, we propose an effective algorithm for Model Aggregation via Exploring Common Harmonized Optima (MA-Echo), which iteratively updates the parameters of all local models to bring them close to a common low-loss area on the loss surface, without harming performance on their own data sets at the same time. Compared to the existing methods, MA-Echo can work well even in extremely non-identical data distribution settings where the support categories of each local model have no overlapped labels with those of the others. We conduct extensive experiments on two popular image classification data sets to compare the proposed method with existing methods and demonstrate the effectiveness of MA-Echo, which clearly outperforms the state-of-the-arts.  ( 2 min )
  • Open

    BINAS: Bilinear Interpretable Neural Architecture Search. (arXiv:2110.12399v3 [cs.LG] UPDATED)
    Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints.  ( 2 min )
    Tensor decomposition for learning Gaussian mixtures from moments. (arXiv:2106.00555v2 [math.AG] UPDATED)
    In data processing and machine learning, an important challenge is to recover and exploit models that can represent accurately the data. We consider the problem of recovering Gaussian mixture models from datasets. We investigate symmetric tensor decomposition methods for tackling this problem, where the tensor is built from empirical moments of the data distribution. We consider identifiable tensors, which have a unique decomposition, showing that moment tensors built from spherical Gaussian mixtures have this property. We prove that symmetric tensors with interpolation degree strictly less than half their order are identifiable and we present an algorithm, based on simple linear algebra operations, to compute their decomposition. Illustrative experimentations show the impact of the tensor decomposition method for recovering Gaussian mixtures, in comparison with other state-of-the-art approaches.  ( 2 min )
    Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production. (arXiv:2204.12526v1 [q-bio.QM])
    In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.  ( 2 min )
    Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters. (arXiv:2111.01409v2 [stat.ML] UPDATED)
    It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.  ( 2 min )
    Knowledge Transfer in Engineering Fleets: Hierarchical Bayesian Modelling for Multi-Task Learning. (arXiv:2204.12404v1 [stat.ML] CROSS LISTED)
    We propose a population-level analysis to address issues of data sparsity when building predictive models of engineering infrastructure. By sharing information between similar assets, hierarchical Bayesian modelling is used to improve the survival analysis of a truck fleet (hazard curves) and power prediction in a wind farm (power curves). In each example, a set of correlated functions are learnt over the asset fleet, in a combined inference, to learn a population model. Parameter estimation is improved when sub-fleets of assets are allowed to share correlated information at different levels in the hierarchy. In turn, groups with incomplete data automatically borrow statistical strength from those that are data-rich. The correlations can be inspected to inform which assets share information for which effect (i.e. parameter).  ( 2 min )
    Compressed sensing of low-rank plus sparse matrices. (arXiv:2007.09457v2 [math.NA] UPDATED)
    Expressing a matrix as the sum of a low-rank matrix plus a sparse matrix is a flexible model capturing global and local features in data popularized as Robust PCA (Candes et al., 2011; Chandrasekaran et al., 2009). Compressed sensing, matrix completion, and their variants (Eldar and Kutyniok, 2012; Foucart and Rauhut, 2013) have established that data satisfying low complexity models can be efficiently measured and recovered from a number of measurements proportional to the model complexity rather than the ambient dimension. This manuscript develops similar guarantees showing that $m\times n$ matrices that can be expressed as the sum of a rank-$r$ matrix and a $s$-sparse matrix can be recovered by computationally tractable methods from $\mathcal{O}(r(m+n-r)+s)\log(mn/s)$ linear measurements. More specifically, we establish that the low-rank plus sparse matrix set is closed provided the incoherence of the low-rank component is upper bounded as $\mu<\sqrt{mn}/(r\sqrt{s})$, and subsequently, the restricted isometry constants for the aforementioned matrices remain bounded independent of problem size provided $p/mn$, $s/p$, and $r(m+n-r)/p$ remain fixed. Additionally, we show that semidefinite programming and two hard threshold gradient descent algorithms, NIHT and NAHT, converge to the measured matrix provided the measurement operator's RIC's are sufficiently small. These results also provably solve convex and non-convex formulation of Robust PCA with the asymptotically optimal fraction of corruptions $\alpha=\mathcal{O}\left(1/(\mu r) \right)$, where $s = \alpha^2 mn$, and improve the previously best known guarantees by not requiring that the fraction of corruptions is spread in every column and row by being upper bounded by $\alpha$. Numerical experiments illustrating these results are shown for synthetic problems, dynamic-foreground/static-background separation, and multispectral imaging.  ( 2 min )
    Differentially Quantized Gradient Methods. (arXiv:2002.02508v4 [cs.LG] UPDATED)
    Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where $\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.  ( 3 min )
    Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning. (arXiv:2204.08735v2 [cs.LG] UPDATED)
    Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.  ( 2 min )
    The Multimarginal Optimal Transport Formulation of Adversarial Multiclass Classification. (arXiv:2204.12676v1 [cs.LG])
    We study a family of adversarial multiclass classification problems and provide equivalent reformulations in terms of: 1) a family of generalized barycenter problems introduced in the paper and 2) a family of multimarginal optimal transport problems where the number of marginals is equal to the number of classes in the original classification problem. These new theoretical results reveal a rich geometric structure of adversarial learning problems in multiclass classification and extend recent results restricted to the binary classification setting. A direct computational implication of our results is that by solving either the barycenter problem and its dual, or the MOT problem and its dual, we can recover the optimal robust classification rule and the optimal adversarial strategy for the original adversarial problem. Examples with synthetic and real data illustrate our results.
    Double Diffusion Maps and their Latent Harmonics for Scientific Computations in Latent Space. (arXiv:2204.12536v1 [stat.ML])
    We introduce a data-driven approach to building reduced dynamical models through manifold learning; the reduced latent space is discovered using Diffusion Maps (a manifold learning technique) on time series data. A second round of Diffusion Maps on those latent coordinates allows the approximation of the reduced dynamical models. This second round enables mapping the latent space coordinates back to the full ambient space (what is called lifting); it also enables the approximation of full state functions of interest in terms of the reduced coordinates. In our work, we develop and test three different reduced numerical simulation methodologies, either through pre-tabulation in the latent space and integration on the fly or by going back and forth between the ambient space and the latent space. The data-driven latent space simulation results, based on the three different approaches, are validated through (a) the latent space observation of the full simulation through the Nystr\"om Extension formula, or through (b) lifting the reduced trajectory back to the full ambient space, via Latent Harmonics. Latent space modeling often involves additional regularization to favor certain properties of the space over others, and the mapping back to the ambient space is then constructed mostly independently from these properties; here, we use the same data-driven approach to construct the latent space and then map back to the ambient space.
    Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback. (arXiv:2204.12764v1 [cs.LG])
    We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest $d$ rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs $\Omega(T)$ pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys $o(T)$ policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for $K$-armed bandit and bandit convex optimization, we have $\mathcal{O}(T^{2/3})$ policy regret bound. We also prove a matching lower bound for $K$-armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is non-oblivious. It answers the open problem proposed in \cite{wang2021adaptive}, showing that non-oblivious delay is enough to incur $\tilde{\Omega}(T^{2/3})$ regret.
    Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study. (arXiv:2204.12868v1 [stat.ML])
    This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v1 [cs.LG])
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods generally assume at least one of (i) source and target task label spaces must overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, these all assumptions are difficult to hold in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to licensing and storage costs, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with a synthesized dataset by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.
    An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate. (arXiv:2204.12554v1 [cs.LG])
    A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.
    Variational Kalman Filtering with Hinf-Based Correction for Robust Bayesian Learning in High Dimensions. (arXiv:2204.13089v1 [stat.ML])
    In this paper, we address the problem of convergence of sequential variational inference filter (VIF) through the application of a robust variational objective and Hinf-norm based correction for a linear Gaussian system. As the dimension of state or parameter space grows, performing the full Kalman update with the dense covariance matrix for a large scale system requires increased storage and computational complexity, making it impractical. The VIF approach, based on mean-field Gaussian variational inference, reduces this burden through the variational approximation to the covariance usually in the form of a diagonal covariance approximation. The challenge is to retain convergence and correct for biases introduced by the sequential VIF steps. We desire a framework that improves feasibility while still maintaining reasonable proximity to the optimal Kalman filter as data is assimilated. To accomplish this goal, a Hinf-norm based optimization perturbs the VIF covariance matrix to improve robustness. This yields a novel VIF- Hinf recursion that employs consecutive variational inference and Hinf based optimization steps. We explore the development of this method and investigate a numerical example to illustrate the effectiveness of the proposed filter.  ( 2 min )
    Accurate inference of crowdsourcing properties when using efficient allocation strategies. (arXiv:1903.03104v2 [cs.LG] UPDATED)
    Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.  ( 2 min )
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v1 [cs.LG])
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.  ( 2 min )
    Scalable particle-based alternatives to EM. (arXiv:2204.12965v1 [stat.CO])
    Building on (Neal and Hinton, 1998), where the problem tackled by EM is recast as the optimization of a free energy functional on an infinite-dimensional space, we obtain three practical particle-based alternatives to EM applicable to broad classes of models. All three are derived through straightforward discretizations of gradient flows associated with the functional. The novel algorithms scale well to high-dimensional settings and outperform existing state-of-the-art methods in numerical experiments.  ( 2 min )
    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite. (arXiv:2202.12169v2 [eess.AS] UPDATED)
    VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlapping speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as most smart home devices now support multiple enrolled users. In order to extend the benefits of personalization to multiple users, we previously developed an attention-based speaker selection mechanism and applied it to VoiceFilter-Lite. However, the original multi-user VoiceFilter-Lite model suffers from significant performance degradation compared with single-user models. In this paper, we devised a series of experiments to improve the multi-user VoiceFilter-Lite model. By incorporating a dual learning rate schedule and by using feature-wise linear modulation (FiLM) to condition the model with the attended speaker embedding, we successfully closed the performance gap between multi-user and single-user VoiceFilter-Lite models on single-speaker evaluations. At the same time, the new model can also be easily extended to support any number of users, and significantly outperforms our previously published model on multi-speaker evaluations.  ( 2 min )
    IH-GAN: A Conditional Generative Model for Implicit Surface-Based Inverse Design of Cellular Structures. (arXiv:2103.02588v4 [cs.CE] UPDATED)
    Variable-density cellular structures can overcome connectivity and manufacturability issues of topologically optimized structures, particularly those represented as discrete density maps. However, the optimization of such cellular structures is challenging due to the multiscale design problem. Past work addressing this problem generally either only optimizes the volume fraction of single-type unit cells but ignores the effects of unit cell geometry on properties, or considers the geometry-property relation but builds this relation via heuristics. In contrast, we propose a simple yet more principled way to accurately model the property to geometry mapping using a conditional deep generative model, named Inverse Homogenization Generative Adversarial Network (IH-GAN). It learns the conditional distribution of unit cell geometries given properties and can realize the one-to-many mapping from properties to geometries. We further reduce the complexity of IH-GAN by using the implicit function parameterization to represent unit cell geometries. Results show that our method can 1) generate various unit cells that satisfy given material properties with high accuracy ($R^2$-scores between target properties and properties of generated unit cells $>98\%$) and 2) improve the optimized structural performance over the conventional variable-density single-type structure. In the minimum compliance example, our IH-GAN generated structure achieves a $79.7\%$ reduction in concentrated stress and an extra $3.03\%$ reduction in displacement. In the target deformation examples, our IH-GAN generated structure reduces the target matching error by $86.4\%$ and $79.6\%$ for two test cases, respectively. We also demonstrated that the connectivity issue for multi-type unit cells can be solved by transition layer blending.  ( 3 min )
    Forecasting Foreign Exchange Rates With Parameter-Free Regression Networks Tuned By Bayesian Optimization. (arXiv:2204.12914v1 [q-fin.ST])
    The article is concerned with the problem of multi-step financial time series forecasting of Foreign Exchange (FX) rates. To address this problem, we introduce a parameter-free regression network termed RegPred Net. The exchange rate to forecast is treated as a stochastic process. It is assumed to follow a generalization of Brownian motion and the mean-reverting process referred to as the generalized Ornstein-Uhlenbeck (OU) process, with time-dependent coefficients. Using past observed values of the input time series, these coefficients can be regressed online by the cells of the first half of the network (Reg). The regressed coefficients depend only on - but are very sensitive to - a small number of hyperparameters required to be set by a global optimization procedure for which, Bayesian optimization is an adequate heuristic. Thanks to its multi-layered architecture, the second half of the regression network (Pred) can project time-dependent values for the OU process coefficients and generate realistic trajectories of the time series. Predictions can be easily derived in the form of expected values estimated by averaging values obtained by Monte Carlo simulation. The forecasting accuracy on a 100 days horizon is evaluated for several of the most important FX rates such as EUR/USD, EUR/CNY, and EUR/GBP. Our experimental results show that the RegPred Net significantly outperforms ARMA, ARIMA, LSTMs, and Autoencoder-LSTM models in this task.  ( 2 min )
    On the Dynamics of Inference and Learning. (arXiv:2204.12939v1 [cond-mat.dis-nn])
    Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.  ( 2 min )
    First do no harm: counterfactual objective functions for safe & ethical AI. (arXiv:2204.12993v1 [cs.AI])
    To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. In this paper we develop the first statistical definition of harm and a framework for factoring harm into algorithmic decisions. We argue that harm is fundamentally a counterfactual quantity, and show that standard machine learning algorithms are guaranteed to pursue harmful policies in certain environments. To resolve this, we derive a family of counterfactual objective functions that robustly mitigate for harm. We demonstrate our approach with a statistical model for identifying optimal drug doses. While identifying optimal doses using the causal treatment effect results in harmful treatment decisions, our counterfactual algorithm identifies doses that are far less harmful without sacrificing efficacy. Our results show that counterfactual reasoning is a key ingredient for safe and ethical AI.  ( 2 min )

  • Open

    [P] Auto-encoder image dimension error
    Hello, I have a problem regarding my auto-encoder model. I built it so it can be able to give a MSE error abnormally high when presented an image too different from what it knows. I have this little function at the end of my program to predict whether or not the image presented is an anomaly : ` def check_anomaly(img_path): density_threshold = 2500 #Set this value based on the above exercise reconstruction_error_threshold = 0.004 # Set this value based on the above exercise img = Image.open(img_path) img = np.array(img.resize((128,128), Image.ANTIALIAS)) plt.imshow(img) img = img / 255. img = img[np.newaxis, :,:,:] encoded_img = encoder_model.predict([[img]]) encoded_img = [np.reshape(img, (out_vector_shape)) for img in encoded_img] density = kde.score_samples(encoded_img)[0] reconstruc…  ( 1 min )
    What's the actual difference in simple terms between Cloudera Data Science Workbench, DataRobot, and DataBricks? [Discussion]
    I've also noticed Cloudera and DataRobot seem to follow user pricing where DataBricks follows the standard hourly machine rate. submitted by /u/SmarterChild8675309 [link] [comments]
    [Project] Multi-Bounding box identification
    I have a data set of satellite images of aircraft. For the labels of these images I have the bounding boxes for all aircraft in that image. I looking for some advice on where to get started for this. I need to estimate all bounding boxes for these aircraft, any suggestions? submitted by /u/The_Dov [link] [comments]
    [D] NLP: anyone familiar with taxonomy extraction and evaluation?
    Hi all, does anyone have experience with automatic taxonomy/ontology extraction from unlabelled corpus, especially how to evaluate the extracted structures without gold standards? most published papers would invite students/researchers to conduct manual reviews, thus making it very difficult to compare the results. thanks in advance. submitted by /u/CestLucas [link] [comments]
    [D] Precision-Recall curve with best F1 Score of 0.56
    I am evaluating the performance of a model. I get different Precison-Recall curves for different configurations and the best F1 score (0.56) corresponds to the red point. Would you consider this performance acceptable? I know there is a lot of room for improvement but is it okay or is it extremely poor performance? https://preview.redd.it/k5qihccdb5w81.png?width=784&format=png&auto=webp&s=a0f3408f2ff671ee491faa50752b6c8e0cf17e4c Also, if I understood it correctly, each point of the Precision-Recall curve has its own F1 Score, right? Thank you! submitted by /u/SeaResponsibility176 [link] [comments]  ( 1 min )
    [P] surgeon-pytorch – a small library to inspect intermediate layers of pyTorch models
    I've made surgeon https://github.com/archinetai/surgeon-pytorch, a small library to inspect the intermediate output layers of pyTorch models without changing the original implementation. This can be very useful if you are using pre-trained models (e.g. from Huggingface or torch.hub) and want to get embeddings, attention matrices, or simply debug the model without adding additional code – which is often hard to do without changing the implementation. I hope this can be helpful to anyone! submitted by /u/Aglitter [link] [comments]  ( 1 min )
    [D] Thoughts on the AI4 conference?
    Lately, I have been receiving many messages from the organizers of this conference that they have some free passes to pass on to attend it: https://ai4.io/usa/ While the speaker line-up from the industry appears impressive, I have not heard of this conference before. Any thoughts on how legit/good this conference is? submitted by /u/roalddahl14 [link] [comments]  ( 1 min )
    [D] Has anyone attempted to recreate the work describe in the Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry Paper?
    The D3VO paper from the Tech. University of Munich looks quite promising, however that Uni does not seem very keen on publishing code along with their papers. Has already tried to reproduce their work? What kind of results did you see in your version? What hardware did you run it on? How difficult was it reproduce their models in pytorch? submitted by /u/autojazari [link] [comments]  ( 1 min )
    [D] - Are GFlow Nets considered diffusion models?
    I stumbled upon GFlow Net and in my opinion, it looks very similar to diffusion models. There is a touch of RL in GFlow Net but the main idea is very similar to diffusion models. is that right? or am I missing something? submitted by /u/dimem16 [link] [comments]
    [D] How is it checked if models do not just memorize their training examples?
    Hello everyone, this post is about generative models! (i.e. Score-based-generative models, GANs, etc.) on leaderboards like this https://paperswithcode.com/sota/image-generation-on-cifar-10 How do they check if the models do not just memorize the training examples? The FID score would be optimal in case you would just generate training examples again. Best submitted by /u/future_gcp_poweruser [link] [comments]  ( 5 min )
    [D] ICLR 2022 blog post track
    It seems like the list of accepted blog posts has been published for the ICLR 2022 blog post track (https://iclr-blog-track.github.io/). They also invited Karpathy to publish his most recent blog post on the history and future of convnets. What do you think about the blog posts? Any blog posts that you particularly like or that are definitely worth reading? Any blog posts that actually have interesting contributions and/or that you plan to cite? But maybe more importantly: how do you think this will evolve? They seem to have decided to organise the blog post track again for next year's ICLR already. Do you think these kind of publications could have an impact on how we do science or is this rather a nice extra on top of scientific work? There has only been one post in this subreddit on one of the accepted blog posts (since the announcement). I think there are some nice blog posts in there, so I expected to find some discussions here, but it seems like it is either ignored or not worth discussing. Therefore, I thought it would be interesting to (try to) start a discussion. TL;DR: what do you think of the ICLR blog post track and/or the accepted blog posts? submitted by /u/mr_tsjolder [link] [comments]  ( 1 min )
    [D] MLOps tools for automatic fine tuning of deployed machine learning models
    I'm working on a ML model for data extraction out of documents. The model is trained and deployed into production. For fine tuning the model and improving performance I added the possibility for users to correct the extracted data. A concrete example would be: the model labels a word in the document as "company_name" the user corrects it to "street_name". This correction is then used to fine tune the model. Currently the fine tuning is done manually. That is, if the number of corrections exceeds some threshold I take them and start a new training and evaluate the new model before putting it into production. My question would be: is there an MLOps tool that automates this process? Or should I write one myself? I am aware of the tool seldon core that offers A/B testing for comparing the old model with the new one and putting it into production. But unfortunately it does not offer automatic fine tuning. Or that's what I understood from their website. submitted by /u/alzoubi36 [link] [comments]  ( 1 min )
    [P] Create interactive slides for Machine Learning models from Jupyter Notebook
    Would you like to create an interactive presentation for your ML model directly from Jupyter Notebook? I'm working on an open-source project for converting notebooks into interactive documents. Recently, I've added the option to turn notebooks into interactive slides. You can showcase your Machine Learning model as an interactive presentation. During the presentation, the user can change values and recompute the slide! I've created a demo presentation where I use Random Forest to predict Iris species. The screenshot recording of the presentation: https://github.com/pplonski/ml-model-slides/raw/main/media/slides-from-ml-model.gif The presentation is available online (deployed at Heroku) https://ml-model-presentation.herokuapp.com/ What is more, the presentation code is on GitHub https://github.com/pplonski/ml-model-slides (yes, code is presentation! bye-bye PPT) In case anyone is interested, the framework is called Mercury. submitted by /u/pp314159 [link] [comments]  ( 1 min )
    [D] Feature engineering automation?
    Hello, I’m working as a Data Scientist currently and I realized that most of my time is spent on feature engineering. My general practice is that I create aggregations of data (via sql because of the amount of data that needs to be processed) like sum, mean, avg, std, median, q25, q75. I need to do it on a few dozen features. Also I am calculating these aggregations on different time windows: previous week, previous month, previous 3 month. At the end I end up with hundreds of features and I need to select the ones that make any sense, contain relevant information. Currently I am applying pandas profiling, or sweetviz on this huge dataset and trying to analyze it by eyeballing the results. My main challenge is that this process is highly repetitive and manual. I am wondering if there is any tool out there that could help me automate this process and make some parts reusable? I like having a UI especially for visualizing the data. Am I doing something wrong or is there a tool that I’m clearly not aware of? submitted by /u/sgergely [link] [comments]  ( 4 min )
    [D] How to store/surface predictions along with immutable data in a database api?
    Hi, I am faced with something that I thought was simpe but I have been thinking about it so much that I now very confused. Any suggestions are helpful ! Let's assume the scenario that you have a database of cars. For each car you have a brand, model, colour etc. You have an api for this db which you can call, with a car id, and surface the car data. People rely on this api for fast and up to date car information. You have 1 million cars in the db. Assume now there is a requirement to predict a yes/no if the car has a black front bumper (completely made up requirement). This has to be surfaced to the api users along with the car data. You have built a classifier for this, and it takes some time to surface a prediction, e.g. 2 mins. You select what you think is the best operating point for your classifier. When a new car gets added into the db you can now run your predictor and you get back a probability and depending on the operating point a yes/no. What is the best approach here now ? Do you store the result of this prediction in the database (probability, model_version and boolean result) alongside the rest of the car data ? Less flexible as if you decide to tweak the operating point you will need to recalculate the new boolean values for all cars - but you have consistency wrt to what you have in the db and what you return to your users. If there is a mistake you can readily fix it by altering the boolean field. Do you just store the probabilities and decide on the yes/no using the model_version & operating point only when you surface the data to the end user ? More flexible if you decide to tweak the opearting point of the classifier. or would you have a completely different approach ? Would you change anything, if for some cars you have the ground truth data of they have a black bumper or not in the db already ? Thanks for reading ! submitted by /u/42isthenumber_ [link] [comments]  ( 2 min )
  • Open

    [2202.12742] Learning Relative Return Policies With Upside-Down Reinforcement Learning
    submitted by /u/chimp73 [link] [comments]
    What does it mean to centralise the observation in MARL?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Off-policy algorithm with batch of actions
    Hello everyone, first of all sorry for my poor English. I am fairly new to Deep RL, and RL in general. I want to implement custom environment with multiple robots in it. Each robot would be given same task to do, so what I am trying to do is simulate parallelism. My question is: if i have 100 robots in simulation, do i have to instantiate 100 agents (neural networks) to control those robots or single network with batch size of 100 will suffice? Does batch of observations in anyway disturbs agent action (network) output? It would be much more memory efficient with single neural network. So far, I've seen that in process of acting in env agent takes observation of batch size 1 and outputs corresponding action for that observation. Since each observation from the batch is propagated trough the network without calculating gradient it should not be affected by the batch size. If anyone could explain to me if my way of thinking is wrong, and why. :) submitted by /u/Dexter_fixxor [link] [comments]  ( 1 min )
    "NeuPL: Neural Population Learning", Liu et al 2022 (encoding PBT agents into a single multi-policy agent)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    How do Actor-Critic networks reduce the variance compared to other PG method like Reinforce ?
    i understand about why REINFORCE has high variance but how does AC mitigates it ? submitted by /u/aabra__ka__daabra [link] [comments]  ( 1 min )
    on-policy vs off-policy
    I'm looking to find the concrete explanation and difference of on-policy and off-policy learning strategy if possible mathematics can alao be explained. submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    Open position at ZF Friedrichshafen AG- Algorithmenentwickler AI & ML
    Feel free to apply! Algorithmenentwickler-AI-&-Machine-Learning-Motion-Planning submitted by /u/gab_ma [link] [comments]
    Internships/Thesis in the field of AI @ZF
    We are currently offering Internships & Theses in the Field of Artificial Intelligence. Here are the links to our open positions. Feel free to share the post! ✌🏽 Mandatory Internship Software Development in the field of Artificial Intelligence Pflichtpraktikant Reinforcement Learning Algorithmen Pflichtpraktikum/ Masterarbeit: Reinforcement Learning submitted by /u/gab_ma [link] [comments]  ( 1 min )
    General questions for those of you up to date on the topic.
    What does the future of deep reinforcement learning look like? I feel like people were pretty hyped about it a year or two ago. Is it heavily researched now and expected to be used more in the future? What are some real world tasks that it can help with? I've seen it used a lot for playing games, self driving cars, and manufacturing robotics. Anything else that it can be applied to? Will it likely be used for autonomous robots when they become more common? What are some areas that can be improved upon? What are the most recent advancements and what is a good topic to focus on for future research? Thank you. submitted by /u/johnGettings [link] [comments]  ( 2 min )
  • Open

    When will humorous AIs press our buttons with their jokes? | Psyche Ideas
    submitted by /u/estasfuera [link] [comments]
    Best AI for blog post creation? Or what tools do you use to get an outline faster?
    I've only every used writesonic.com I got premium acct access for free. I love it for when u need to quickly spin an article if I'm on a backlink building spree. Haven't tested against others but overall good, and improvements we being made on it all the time (in active development). submitted by /u/CliffWoolum [link] [comments]  ( 1 min )
    Showcase your ML model in a python Web GUI
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    John Deere is becoming one of the world's most important AI companies
    submitted by /u/estasfuera [link] [comments]  ( 1 min )
    I keep getting terrible advice from AI assistants. Help me rate the worst and the best responses :)
    I was playing around with several AI models and this sort of stuff happens all the time. https://preview.redd.it/6arq56nih3w81.png?width=1163&format=png&auto=webp&s=5ca68dce977ae671b0b928c2f1ec3cdd8478257f I decided to prepare a compilation and rank the answers. Can you help me with deciding how bad or good are some of the answers by taking a survey here? Edit: I apologize if some of you felt tricked into completing the survey by the original version of the post. It does have about 50 questions but they are mostly just yes/no, good/bad, so it shouldn't take longer than 10 minutes. I intend to write a piece about how AI assistants are doing and prepare a compilation of AI fails. I will share the results :) submitted by /u/KazRainer [link] [comments]  ( 1 min )
    Is there a dataset for personal items?
    Hi! Im looking for a dataset containing images of personal items (Wallet, keys, phone etc), annotated by bounding boxes. Cant seem to find anything, do anyone know of such a dataset? Thanks in advance! submitted by /u/ifinty [link] [comments]  ( 1 min )
    Can I get AI language bots, who've already been trained?
    I'm brand new to programming. I am thinking I need a bot to skim through pdf's and through trial and error, I can take the pdf's and make the bot come up with suggestions to scripts or books.However I need to use an AI which already has training in English grammatical and sentence structure and langue/information flow. not nescceecarily interpretation meaning or philosophy or other higher levels of language proficiency. Say I wanted to write a novel about sailors. I'd let it skim some 100 novels on sailing, and generate inputs. then I'd match it up against a separate set of Ai to fact check each other. or even with form blogs from some sailing subreddits or news feeds. I could do this with any topic. medicine, engineering, biology, drama novels etc. Are there any free/opensource bots who already know the English language and might read books and make suggestions based on guided inputs? submitted by /u/International__ [link] [comments]  ( 3 min )
    Edit Images Using Sketches! NVIDIA EditGAN Explained. Control any feature from quick drafts
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Train Test Split in Way Too Much Depth
    submitted by /u/mgalarny [link] [comments]
    AI can't help blind people like me with everything... But it certainly can help me find my clothes
    submitted by /u/thisisjoshtseng [link] [comments]  ( 1 min )
    guess its still soon to ask bots about anything feeling related
    ​ https://preview.redd.it/jbvwyd7u41w81.png?width=393&format=png&auto=webp&s=6cbd5568a84c4d96c03ca9a99bbed73b1576a1d9 any bots that can answer better? submitted by /u/MalwareLord [link] [comments]
    How do you “get your own AI bot?”
    I hear about people who trained or fed an AI bot to say, write a Biden Speech, or create poetry. I want to create a bot that pulls important numbers from my local community to post to social media — like lowest gas prices in town, weather average, mortgage interest rates and a couple other things. submitted by /u/No-Setting2541 [link] [comments]  ( 1 min )
    Artificial Nightmares: Disney Princess Ella || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Thinking of starting a personal project to filter profanity, wanted to air out my plan for recommendations before starting
    So, I'm not super comfortable with swearing, but there are a few YouTubers I follow that are as vulgar as they are hilarious. My plan, as it stands, is to use python in concert with some sort of generic voice to text software to get a list of timestamps for any F words, and then use that list of timestamps to generate an FFMPEG command to cut the audio at each timestamp and remerge into a new video. Does anyone have any recommendations for audio processing packages or better approaches? I'm keen to use anything with hardware acceleration (because making a script to abuse my shiny new graphics card is half my motivation). P.S. I haven't touched python much since graduating, can't wait to dig my claws back in. Although... if anyone has an audio processing package in C# that would be hecking cool too Have a nice day y'all submitted by /u/The_Real_Slim_Lemon [link] [comments]  ( 1 min )
  • Open

    Google Colab for Machine Learning Projects
    Have you ever wanted an easy-to-configure interactive environment to run your machine learning code that came with access to GPUs for free? Google Colab is the answer you’ve been looking for. It is a convenient and easy to use way to run Jupyter notebooks on the cloud and their free version comes with some limited […] The post Google Colab for Machine Learning Projects appeared first on Machine Learning Mastery.  ( 13 min )
  • Open

    Machine learning, harnessed to extreme computing, aids fusion energy development
    Linking techniques from machine learning with advanced numerical simulations, MIT researchers take an important step in state-of-the-art predictions for fusion plasmas.  ( 6 min )
  • Open

    MoLeR: Creating a path to more efficient drug design
    Drug discovery has come a long way from its roots in serendipity. It is now an increasingly rational process, in which one important phase, called lead optimization, is the stepwise search for promising drug candidate compounds in the lab. In this phase, expert medicinal chemists work to improve “hit” molecules—compounds that demonstrate some promising properties, […] The post MoLeR: Creating a path to more efficient drug design appeared first on Microsoft Research.  ( 6 min )
  • Open

    Answers Blowin’ in the Wind: HPC Code Gives Renewable Energy a Lift
    A hundred and forty turbines in the North Sea — and some GPUs in the cloud — pumped wind under the wings of David Standingford and Jamil Appa’s dream. As colleagues at a British aerospace firm, they shared a vision of starting a company to apply their expertise in high performance computing across many industries. Read article > The post Answers Blowin’ in the Wind: HPC Code Gives Renewable Energy a Lift appeared first on NVIDIA Blog.  ( 4 min )
    What Is Conversational AI? ZeroShot Bot CEO Jason Mars Explains
    Entrepreneur Jason Mars calls conversation our “first technology.” Before humans invented the wheel, crafted a spear or tamed fire, we mastered the superpower of talking to one another. That makes conversation an incredibly important tool. But if you’ve dealt with the automated chatbots deployed by the customer service arms of just about any big organization Read article > The post What Is Conversational AI? ZeroShot Bot CEO Jason Mars Explains appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    TinyML, an underrated field of Machine Learning
    TinyML is a groundbreaking technology! Possessing a lot of potential it is sure to grow exponentially in the coming years.  ( 3 min )
  • Open

    Calculating where projective lines intersect
    A couple days ago I wrote about homogeneous coordinates projective planes. I said that the lines y = 5 and y = 6 intersect in a point “at infinity.” In projective geometry any two distinct lines intersect in exactly one point, and you can compute that intersection point the same way, whether the intersection is […] Calculating where projective lines intersect first appeared on John D. Cook.  ( 4 min )

  • Open

    [D] Transformer-Models-from-Scratch
    I recently started learning machine learning, and I have implemented several transformer models for different tasks from scratch in PyTorch in my Github repository: Transformer-Models-from-Scratch The notebooks are self-contained. And I also included a note I wrote on transformers. Hope it's helpful for anyone learning the transformer model! Let me know if you have any comments! submitted by /u/hbchen-one [link] [comments]
    [P] Curated List of Company Blogs about MLops/ Infra
    Hi all. I am starting a github repo to compile a list of company blogs about their MLops/ infra. Please feel free to contribute if you are interested: https://github.com/enochkan/awesome-ml-stack submitted by /u/kanxx030 [link] [comments]
    [P] TorToiSe - a true zero-shot multi-voice TTS engine
    I'd like to show off a TTS system I have been working on for the past year. I've open-sourced all the code and the trained model weights: https://github.com/neonbjb/tortoise-tts This was born out of a desire to reproduce the original DALLE with speech. It is "zero-shot" because you feed the text and examples of a voice to mimic as prompts to an autoregressive LLM. I think the results are fantastic. Here are some samples: https://nonint.com/static/tortoise_v2_examples.html Here is a colab in which you can try out the whole system: https://colab.research.google.com/drive/1wVVqUPqwiDBUVeWWOUNglpGhU3hg_cbR submitted by /u/neonbjb [link] [comments]  ( 1 min )
    [P] Baseten – Build ML-powered applications
    Hey, we've been building Baseten to be able quickly deploy models, backends and frontends. I'd love to get your feedback. submitted by /u/Available-Cookie2754 [link] [comments]
    [N] Upcoming talk on Data centric approach to AI from experience at Youtube, ScaleAI and Apple
    Hey folks, There’s an upcoming free talk on May 13. This is what I know: Vijay K, Head of Engineering at Scale.ai, and Mike Wu, Stanford PhD in Machine Learning are going to be talking about strategies for taking a data centric approach to AI, and Vijay’s lessons from doing this at Apple, YouTube, and Scale AI. There’s a a renewed focus on the data layer as a foundation for successful ML projects, and Vijay participated in this transformation firsthand. You’ll be able to hear his reflections and learnings, should be super useful! Sign-up link is here, see you there. submitted by /u/sb2nov [link] [comments]  ( 1 min )
    [News] Adept AI Labs Launches
    This seems like a pretty powerhouse team adept.ai/post/introducing-adept submitted by /u/mrpogiface [link] [comments]
    [D] Is it possible to train a very large 6B model learn from a single training input 1 2 3 to infer 6 7 8 (9) ?
    Hi, I was wondering how long it would take or if it would be possible for a model with 6 billion neurons to become able to predict that for 6 7 8 the next number should be 9, given only a single training example 1 2 3. Let's say we use an architecture like the ones used in GPT. Essentially, I am trying to make sense of something here: is the human design component in DL the weak link in the AGI chain? Much like we could not achieve a true AI by manually coding each rule with if's and else, perhaps we cannot achieve true intelligence by manually designing the networks. Can we grow a model organically so it immediately gets this right from a single training example, and continue using the same network to keep on learning from rich input and making new observations. For guidance, the network can use what it already knows, learning outward. A pre-programmed inherent ability to find motifs, for example with STUMPY and Time Series Analysis, allows it to make relationship observations, and is how the agent immediately guesses that a numeric pattern increments by 1 from a single training example. Putting this into DL, perhaps we can achieve something less organic but still good using a model-agnostic architecture with multi-resolution 'tiles' of neurons. Some property of the information would theoretically allow deciding if the tile should be upgraded to a higher resolution (more neurons), and a side buffer keeps track of the connections between these tiles and tries to move the topology forward, adding some small clusters, removing others, attempting to connect them, etc. submitted by /u/o_snake-monster_o_o_ [link] [comments]  ( 2 min )
    [D] Understanding the use of EMA in Diffusion models
    Reading the original diffusion models paper and the improved diffusion model by openAI, I noticed they are using EMA (exponential moving average) to update the parameters of the models. so I started looking at the code openAI published for their version of the diffusion models, and when looking at the code, I see that the model during the training process has its params stored in a variable called "master_params" and then they create a deep copy of the params and call them ema_params. when looking at the "optimize_normal" method, I see that they update the model params using AdamW and gradient descent, and then after that, they update the ema param variable using the EMA equation, so that means the actual model params do a full gradient descent step to the reach minimum of the loss function, and then they do a pseudo step from the original parameters before the optimizer and making them closer to the params after the optimizer. but then looking at the rest of the code, all I see is that they just save a checkpoint of the ema params to the disk but never update the model params using them or anything. so my question is, what is the EMA for if it is not used during training and the model is fully updated using "classical" machine learning optimization with gradient descent? only at inference time do they load the EMA params to generate images, instead of the regular params that were updated using the AdamW? submitted by /u/eyalmazuz [link] [comments]  ( 2 min )
    [P] Benchmarking and profiling Hugging Face training with Graphsignal
    We've recently added Hugging Face support to https://github.com/graphsignal/graphsignal profiler, which I'd like to share in case someone finds it useful in their efforts to optimize speed and compute. More details, code and screenshots in the blog post https://graphsignal.com/blog/benchmarking-and-profiling-hugging-face-training-with-graphsignal/. submitted by /u/l0g1cs [link] [comments]
    [N] MIT/Meta AI released their new SOTA unsupervised sentence embedding model "DiffCSE"
    Researchers from MIT/Meta recently released a new framework for unsupervised sentence embedding. The performance seems to be better than SimCSE, the previous SOTA, by 2.3 absolute points on downstream tasks. The pretrained models are available on Huggingface. GitHub: https://github.com/voidism/DiffCSE arXiv: https://arxiv.org/abs/2204.10298 submitted by /u/virtualenv [link] [comments]  ( 1 min )
    [D] What do you think of the double standard where an AI learning from copyrighted material is “stealing”, but when a human does it that’s just education?
    Some useful optional considerations: Assume copies of the original copyrighted work are owned lawfully. Assume the AI maintains a single active copy to avoid group performance. The implications this has on banning possibly uploading your own consciousness to save your life on copyright grounds. This “stealing” concept people jump to seems to bring up some interesting logical contradictions. What do you think? submitted by /u/sext-scientist [link] [comments]  ( 2 min )
    [News] New Jupyter Notebook competition
    Are you passionate about coding, data science or Earth observation? https://preview.redd.it/wfwk9ifo4vv81.png?width=1920&format=png&auto=webp&s=5f11885bd8efe88986c181b565cc160534634f0b We're looking for bright-minded people from around the world to showcase their skills and develop new Jupyter Notebooks using Copernicus data! Sound interesting? Find out more here: https://www.eumetsat.int/science-blog/new-jupyter-notebook-competition submitted by /u/EUMETSAT [link] [comments]
    [P] K-Prototypes and evaluating model performance and drift
    I have reached a blocking point in a current project. We are using K-Prototypes to segment two populations (one with 100k elements and the other with 1M). In order to evaluate the clustering, during our K-Means stage we used silhouettes which are already implemented in sklearn. For the second stage, this became a problem. Either it was a time issue or a memory issue. So, for the 100k dataset, finding the silhouette profile took around 50 hours using a custom distance metric function. While this is feasible in the project's scope, for the 1M dataset the computation time would be highly impractical. As such, at the moment, we are stuck without a proper evaluation metric for our model. Sure, we can run Davies-Bouldin or other similar metrics, but the silhouette profile gave us much more detailed information. We are also moving the project to databricks. At first, the pyspark clustering evaluator had me hopeful, but it has very limited options regarding distance metrics. This is also an issue because the model is to be deployed in production and should have some metric informing when it needs to be retrained. While this point is still fluid, this is the preferred course of action. Has anyone faced similar issues using K-Prototypes? Or just silhouette profiles with custom distance metrics? submitted by /u/CaptMartelo [link] [comments]  ( 1 min )
    [P] We cleaned up Pascal and improved mAP by 13%
    How important is clean data for how your AI models perform? According to our experiments - very important. Using state-of-the-art confidence learning to clean up PASCAL, two people improved our primary model metric by 13% in a week. To learn more about our results and what we did check out our article: https://hasty.ai/content-hub/articles/cleaning-pascal-improving-map-by-13?utm_source=mk832ksa Disclaimer: We used our own platform to clean up the data and the article, therefore, contains self-promotion. However, the article mainly focuses on the results we achieved. submitted by /u/treebeard_hasty_ai [link] [comments]  ( 3 min )
    [R][P] Using language models for molecule captioning and text-based molecule generation
    Hi. We recently did some work on using language models for molecule captioning and text-based molecule generation. You can think of it as doing translation between molecules and natural language. Would love to know if you have any feedback 🤗. Arxiv: https://arxiv.org/abs/2204.11817 submitted by /u/SimilarShape9122 [link] [comments]  ( 1 min )
    [D] Making text-to-image even better - GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    “Diffusion models beat GANs”. While true, the statement comes with several ifs and buts, not to say that the math behind diffusion models is not for the faint of heart. Alas, GLIDE, an OpenAI paper from last December took a big step towards making it true in every sense. Specifically, it introduced a new guidance method for diffusion models that produces higher quality images than even DALL-E, which uses expensive CLIP reranking. And if that wasn’t impressive enough, GLIDE models can be fine-tuned for various downstream tasks such a inpainting and and text-based editing. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/289 Blog post: https://www.casualganpapers.com/faster-diffusion-models-text-to-image-classifier-free-guidance/GLIDE-explained.html GLIDE arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    submitted by /u/aidev2040 [link] [comments]
    Exploring Neural Networks Visually in the Browser
    submitted by /u/nickb [link] [comments]
    What has priority in the performance?
    I'm really new to neural networks and I was wondering how to speed up the process of selecting the best one when I do the training. What I mean is, among the training/validation/test quotas, the number of hidden layers, the type of activation functions, the size of each layers, how do I iterate to find the optimal combination without having to try every little mix? Is there a way to rank the impact of these 4 components on the mse and iterate one at a time to select each aspect? Thanks in advance submitted by /u/beppegrosso97 [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    submitted by /u/aidev2040 [link] [comments]
    Yuval Noah Harari: "One of the things many people don't realize about the AI revolution and the automation revolution: They imagine it as some kind of a one-time event ... This is an extremely unlikely scenario, because we are nowhere near the maximum potential of AI." (3-min. clip)
    submitted by /u/frog9913 [link] [comments]  ( 1 min )
    Dreamy Trippy AI generated Video! VQGAN CliP Rife-RealESRGAN upscale...
    submitted by /u/LordPewPew777 [link] [comments]
    AI Dream 33 - Battestar Galactica Nebula Explosion
    submitted by /u/LordPewPew777 [link] [comments]
    US: Cisco and Verizon collaborated on a successful proof of concept demo in Las Vegas 'meet the latency thresholds required for autonomous driving applications – replacing the costly roadside radios previously required to meet those needs.'
    submitted by /u/dannylenwinn [link] [comments]  ( 1 min )
    Last Week in AI: AI uses in government surveillance, AI teaches human drivers, actors union opposes AI actors, and more!
    submitted by /u/regalalgorithm [link] [comments]
    AI Dream 44 - Epic Cathedral Supernatural Visit
    submitted by /u/LordPewPew777 [link] [comments]
    Resources
    Do you know reliable sites where to learn artificial intelligence? (i'm studying computer engineering) but i already want to study something in advance submitted by /u/oraudev [link] [comments]  ( 1 min )
    The Future of Apps: Intelligence
    https://blog.r2c.io/the-future-of-apps-intelligence/ submitted by /u/R2Consulting [link] [comments]
    We cleaned up Pascal and improved mAP by 13%
    How important is clean data for how your AI models perform? According to our experiments - very important. Using state-of-the-art confidence learning to clean up PASCAL, two people improved our primary model metric by 13% in a week. To learn more about our results and what we did check out our article: https://hasty.ai/content-hub/articles/cleaning-pascal-improving-map-by-13?utm_source=da39a3ee Disclaimer: We used our own platform to clean up the data and the article, therefore, contains self-promotion. However, the article mainly focuses on the results we achieved. submitted by /u/treebeard_hasty_ai [link] [comments]  ( 1 min )
    Interested in cognitive science? "Joscha Bach Bits" is a new YouTube channel dedicated to the renowned cognitive scientist Joscha Bach...
    As featured on the Lex Fridman podcast, the Singularity weblog podcast and the Future of Life Institute podcast The channel features shorts of Joscha's opinions and perspectives edited from podcasts. You can check out the trailer, which mostly consists of podcast hosts' minds imploding. All channel videos Channel playlists Channel creator: /u/24karate Enjoy 🤖 submitted by /u/tasinet [link] [comments]  ( 1 min )
    Machine learning's abiding weakness is verification
    submitted by /u/koavf [link] [comments]
    Artificial Nightmares: Monsters Inc || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    A.I. Is Mastering Language. Should We Trust What It Says? • OpenAI’s GPT-3 and other neural nets can now write original prose with mind-boggling fluency — a development that could have profound implications for the future.
    submitted by /u/Naurgul [link] [comments]  ( 1 min )
    Making text-to-image even better - GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    “Diffusion models beat GANs”. While true, the statement comes with several ifs and buts, not to say that the math behind diffusion models is not for the faint of heart. Alas, GLIDE, an OpenAI paper from last December took a big step towards making it true in every sense. Specifically, it introduced a new guidance method for diffusion models that produces higher quality images than even DALL-E, which uses expensive CLIP reranking. And if that wasn’t impressive enough, GLIDE models can be fine-tuned for various downstream tasks such a inpainting and and text-based editing. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/289 Blog post: https://www.casualganpapers.com/faster-diffusion-models-text-to-image-classifier-free-guidance/GLIDE-explained.html GLIDE arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Build and deploy a scalable machine learning system on Kubernetes with Kubeflow on AWS
    In this post, we demonstrate Kubeflow on AWS (an AWS-specific distribution of Kubeflow) and the value it adds over open-source Kubeflow through the integration of highly optimized, cloud-native, enterprise-ready AWS services. Kubeflow is the open-source machine learning (ML) platform dedicated to making deployments of ML workflows on Kubernetes simple, portable and scalable. Kubeflow provides many […]  ( 15 min )
    Create random and stratified samples of data with Amazon SageMaker Data Wrangler
    In this post, we walk you through two sampling techniques in Amazon SageMaker Data Wrangler so you can quickly create processing workflows for your data. We cover both random sampling and stratified sampling techniques to help you sample your data based on your specific requirements. Data Wrangler reduces the time it takes to aggregate and […]  ( 7 min )
    Part 4: How NatWest Group migrated ML models to Amazon SageMaker architectures
    The adoption of AWS cloud technology at NatWest Group means moving our machine learning (ML) workloads to a more robust and scalable solution, while reducing our time-to-live to deliver the best products and services for our customers. In this cloud adoption journey, we selected the Customer Lifetime Value (CLV) model to migrate to AWS. The […]  ( 12 min )
    Part 3: How NatWest Group built auditable, reproducible, and explainable ML models with Amazon SageMaker
    This is the third post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS Professional Services to build a new machine learning operations (MLOps) platform. This post is intended for data scientists, MLOps engineers, and data engineers who are interested in building ML pipeline templates with Amazon SageMaker. […]  ( 8 min )
    Part 2: How NatWest Group built a secure, compliant, self-service MLOps platform using AWS Service Catalog and Amazon SageMaker
    This is the second post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS Professional Services to build a new machine learning operations (MLOps) platform. In this post, we share how the NatWest Group utilized AWS to enable the self-service deployment of their standardized, secure, and compliant MLOps […]  ( 12 min )
    Part 1: How NatWest Group built a scalable, secure, and sustainable MLOps platform
    This is the first post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS to build a scalable, secure, and sustainable machine learning operations (MLOps) platform. This initial post provides an overview of the AWS and NatWest Group joint team implemented Amazon SageMaker Studio as the standard for […]  ( 11 min )
    Accelerate data preparation with data quality and insights in Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that helps data scientists and data engineers quickly and easily prepare data for machine learning (ML) applications using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. Today, […]  ( 7 min )
  • Open

    Reward after each action vs reward after taking all actions in an episodic environment of N steps
    Hello, I am working on compression of deep neural networks using reinforcement learning. There is one agent that learns to compress convolutional layers using C actions and another one that compresses dense layers using D actions. If there are two conv layers and 3 dense layers, 5 actions have to be selected in a sequence using both agents in order to fully compress the model. I read the paper AdaDeep and found it really useful for my research, but I don't get why the authors select all actions and they only calculate the reward after completely compressing the network instead of getting the reward after each action. In their place I would select the action, calculate the reward of that action and store it in the replay. By only using immediate reward, the agent should be able to learn which sequence of actions would work the best for the current model. Why assign the same reward to the selected actions for each layer? Is it only because the outcome was due to the combination of actions and they want to speed up training? If my understanding is correct, assigning the immediate reward to each action would yield the same results in the long run. Thanks in advance. submitted by /u/ElvishChampion [link] [comments]  ( 1 min )
    Multi Agent RL: agents act in different frequencies?
    After reading D, Multi's post, I'm wondering is it possible that two different agents to take action in their own action space & their own frequency? submitted by /u/YMXin1999 [link] [comments]
    I’m going to build a game where the goal will be making many deliveries using the shortest route. How might the environment best be represented to the agent?
    The environment makes a lot of sense to me, as a human. It’s a network composed of nodes and edges. Nodes are the points in the network, and edges are the lines connecting them. The entire thing resembles a real street network and uses coordinate points to position itself on a graph. Of the entire map, select nodes will represent gas stations, and other select nodes will represent stop locations for deliveries. The agent will start at a node and map a route to each location, essentially by stringing together an array of connected edges. It’ll also need to travel to gas stations every X miles or it’ll run out of gas. Now, I’ve never done this before so I’m gonna bounce some of my ideas off a wall here Passing the entire thing to an agent and having it render a graph and whatnot to determ…  ( 2 min )
    What is a train step counter?
    In this repository that I'm looking at, there's an input variable whose meaning I don't understand. At line 135: https://github.com/google-research/google-research/blob/722494ce68130a7409bf94500002c79014905d53/social_rl/multiagent_tfagents/multiagent_ppo.py#L135 train_step_counter: An optional counter to increment every time the train op is run. Defaults to the global_step. What is train_step_counter? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    exponential weighted average
    hey guys I guess this can be trivial for the most of you but I can't get around it, how do I prove this is an exponential weighted average. Thanks The equation is labeled (10) in the link https://chowdera.com/2021/12/202112200806190809.html from reinforcement learning: an introduction by Sutton and barto , tracking a non stationary problem in chapter 2 submitted by /u/ma7modbasha [link] [comments]  ( 1 min )
  • Open

    Data quality: What and why is it important?
    With the internet producing quintillions of readily available information per day, you could be forgiven to think that data is losing its value. Apparently, data is one of those weird commodities that go up in value the more they are available, or perhaps we haven’t produced enough to attain the demand-supply equilibrium. Virtually all companies… Read More »Data quality: What and why is it important? The post Data quality: What and why is it important? appeared first on Data Science Central.  ( 3 min )
    2 Ways in Which Automatic Data Labeling Saves Time and Costs
    Data scientists face a problem: machine learning models need to be trained on labeled datasets, but labeling the data is tedious and time-consuming. Enter automatic data labeling, in which most of the preprocessing work is done by a computer.  At first glance, automatic data labeling sounds too good to be true. Of course, more automation… Read More »2 Ways in Which Automatic Data Labeling Saves Time and Costs The post 2 Ways in Which Automatic Data Labeling Saves Time and Costs appeared first on Data Science Central.  ( 4 min )
    DSC Weekly Newsletter 26 April 2022: Why The Case for RTO Remains Weak
    As the Omicron variant of Covid-19 surged around the globe in 2021, managers who had begun contingency plans for a return to the office quietly shelved them to wait out the next wave. Heading into the summer of 2022, the omicron-delta variant lurks on the horizon, though whether or not this will trigger the massive… Read More »DSC Weekly Newsletter 26 April 2022: Why The Case for RTO Remains Weak The post DSC Weekly Newsletter 26 April 2022: Why The Case for RTO Remains Weak appeared first on Data Science Central.  ( 10 min )
    4 Successful Integrated Marketing Communications Examples
    Integrated Marketing Communications (IMC) is an effective communication process that is intended to strengthen the relationship between the customer and the company while enhancing the company’s sales. IMC utilizes a combination of traditional and new approaches in marketing. IMC uses the channel that is most effective to reach the customer. This blog will look at… Read More »4 Successful Integrated Marketing Communications Examples The post 4 Successful Integrated Marketing Communications Examples appeared first on Data Science Central.  ( 4 min )
    No, AI won’t replace astronauts – and here’s why
    A new book predicts artificial intelligence will soon replace astronauts. The authors posit that robots are cheaper, more reliable, and better suited to space travel. But with the human desire for exploration, AI is unlikely to replace astronauts fully. AI will close the gap with human capabilities in the next few decades and surpass them… Read More »No, AI won’t replace astronauts – and here’s why The post No, AI won’t replace astronauts – and here’s why appeared first on Data Science Central.  ( 4 min )
    Smart Factory- Building Future with 5G
    The implementation of digital technologies blurs the line between the physical and digital world. It has become clear that there is a strong need for digital transformation to achieve the next level of efficiency, connectivity, and flexibility needed in manufacturing to weather modern-day disruptions, risks, and fluctuating demands. 5G SMART believes that 5G will be… Read More »Smart Factory- Building Future with 5G The post Smart Factory- Building Future with 5G appeared first on Data Science Central.  ( 2 min )
    Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes
    Stakeholder Journey Maps are a fabulous tool to intimately understand what a stakeholder is trying to accomplish (their objectives and intentions) and the steps/actions/decisions that stakeholder needs to make to complete their journey. Stakeholder Journey Maps are commonly used to help designers to create the optimal user interface and nicely segue into UI storyboards and… Read More »Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes The post Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes appeared first on Data Science Central.  ( 5 min )
    An analysis of Digital Twin Applications across industries
    Background Digital Twins are virtual representations of physical objects, and they can be connected with their physical counterparts. Through this connection, Digital Twins contribute to the convergence of the real and the virtual world. While the Digital twin’s concept is focused on the manufacturing industry, the paper “Dimensions of Digital Twin Applications – A Literature… Read More »An analysis of Digital Twin Applications across industries The post An analysis of Digital Twin Applications across industries appeared first on Data Science Central.  ( 6 min )
    Make Sure Your Online Data Science Courses Teach These 6 Core Skills
    Data science is a wide field with many specializations, and an individual can have a great career with a data science degree. However, curriculums vary between schools, and the specific data science classes taught in one school may not be taught in another. There are several core skills in the data science field that recruiters… Read More »Make Sure Your Online Data Science Courses Teach These 6 Core Skills The post Make Sure Your Online Data Science Courses Teach These 6 Core Skills appeared first on Data Science Central.  ( 3 min )
    What Personal Knowledge Graphs Have to Do with Business
    I help lead a working group focused on personal knowledge graphs (PKGs). Lately, it’s functioned as a discussion and demo evaluation group for new technologies and how they might be used in a knowledge graph context.   Different individuals want to annotate different kinds of data. Some do a lot of research. For them, the need is… Read More »What Personal Knowledge Graphs Have to Do with Business The post What Personal Knowledge Graphs Have to Do with Business appeared first on Data Science Central.  ( 4 min )
    Healthcare App Development: Why You Should Opt for React
    The world of healthcare has consistently evolved, yes, but the fact remains it has gone through tremendous change ever since the coronavirus pandemic started, thus driving the need for modern solutions to meet the increasingly varying needs of patients. In this context, mobile apps have proven to be the leading tool that has driven focus… Read More »Healthcare App Development: Why You Should Opt for React The post Healthcare App Development: Why You Should Opt for React appeared first on Data Science Central.  ( 3 min )
    Why Agile Often Fails and What to Do When It Happens
    Agile, Agile 2 and Agility, Part III In the previous articles in this series, we discussed the role that agile digital delivery capabilities plays in your company’s competitiveness and why rapid delivery is so important.  This article will look at the many reasons that Agile adoptions frequently fail to deliver what companies expect and suggest… Read More »Why Agile Often Fails and What to Do When It Happens The post Why Agile Often Fails and What to Do When It Happens appeared first on Data Science Central.  ( 7 min )
    How to Protect Your Computer Data
    When it comes to protecting your computer, you can do a few basic things. First, create separate user accounts for work and personal data. Make sure to back up your data and use a firewall. Also, make sure to encrypt it.You should make sure to back up any important documents or photos you may have… Read More »How to Protect Your Computer Data The post How to Protect Your Computer Data appeared first on Data Science Central.  ( 4 min )
  • Open

    Misconceptions about AI, Robotics, and Machine Learning
    Yes and no. So, if you asked this question, good one! When I was new to this stuff, I had the same question and searched up a lot about it.  ( 3 min )
  • Open

    In the NVIDIA Studio: April Driver Launches Alongside New NVIDIA Studio Laptops and Featured 3D Artist
    This week In the NVIDIA Studio, we’re launching the April NVIDIA Studio Driver with optimizations for the most popular 3D apps, including Unreal Engine 5, Cinema4D and Chaos Vantage. The driver also supports new NVIDIA Omniverse Connectors from Blender and Redshift. The post In the NVIDIA Studio: April Driver Launches Alongside New NVIDIA Studio Laptops and Featured 3D Artist appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    A smarter way to develop new drugs
    A new artificial intelligence technique only proposes candidate molecules that can actually be produced in a lab.  ( 6 min )
  • Open

    Random Blaschke products and Mathematica binding
    A Blaschke product is a function that is the product of Blaschke factors, functions of the form b(z; a) = |a|  (a – z) / a (1 – a*z) where the complex number a lies inside the unit circle and a* is the complex conjugate of a. I wanted to plot Blaschke products with random […] Random Blaschke products and Mathematica binding first appeared on John D. Cook.  ( 2 min )

  • Open

    [D] Hypothetically, what's the value in being able to label ~500 million images a day?
    I'm not going to make this vague. Specifically I'm seeing a lot of comments related to Elon Musk saying he's going to remove bots from Twitter. There's a lot of speculation on how this could be done with comments suggesting a captcha-based system for every post/reply (or maybe every action?). More specifically people seem fixated on captcha systems that can't be botted. (Ignore for a moment the accessibility issues and audio fallbacks that might be required). I'm aware that Tesla transitioned to using more massive synthetic datasets for training, so this might be somewhat outdated. That said they do have a lot of data collection of new real world data from their sensors. This has me curious on the estimated value of large-scale captcha systems directly tied with a company that might need large-scale labeling services. I'm sure researchers here have run numbers on labeling services and how to best to utilize them. (Goes without saying the prices vary quite a bit and cover a wide range of tasks from simple bounding boxes to relatively expensive polygons for semantic labeling. For example just as a reference: https://cloud.google.com/ai-platform/data-labeling/pricing ). Most services don't list prices for like 500 million tasks which makes sense given that's a lot. A captcha system would have independent users redoing tasks multiple times to find baselines, but in general it would average to find a ground truth label. This redundant work isn't wasted in this sense as it can find say difficult scenarios and refine labels. I could naively say 500 million / (10 USD/1000 tasks) = 5 million USD a day not counting server costs and development. Not super scientific though and it seems high. (I have doubts on if gathering 500 million samples of value a day to even get labeled is realistic). I digress, what would you say is the hypothetical value of such a system using short captcha tasks? submitted by /u/Sirisian [link] [comments]  ( 2 min )
    [N] Modular Reasoning, Knowledge, & Language (MRKL) Hybrid System For More 'General' NLP
    AI21 Labs’ Modular Reasoning, Knowledge and Language (MRKL, pronounced “miracle”) system – and Jurassic-X includes one or more language models, and augment them with external knowledge sources as well as symbolic reasoning experts that can handle tasks that lie beyond the reach of neural models. There are 55 different task-specific modules that MRKL currently supports. If the router is unsure which module is best, it calls on Jurassic-1. Jurassic also helps compose the contextual language around MRKL’s response. This allows MRKL to give factual answers with up-to-date information instead of being limited to its training data alone, and gives it the ability to carry out a much wider range of NLP tasks as compared to other LLM's like Google's PaLM or OpenAI's GPT-3 AI21 blog and whitepaper here Video here submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [D] Opinions on NVIDIA TAO Toolkit?
    I'm working on an Edge ML product where we train models in the cloud and then run them on a device using tensorRT. We're considering switching to using the Nvidia TAO Toolkit for training. If you've used TAO, do you like it? Is it limiting? Our alternative is training in pytorch and then converting to ONNX and then tensorRT separately. Thanks! submitted by /u/linguistBot [link] [comments]  ( 1 min )
    [R][P] An arxiv-sanity-like view of ICLR 2022 papers
    submitted by /u/tanelai [link] [comments]  ( 1 min )
    [D] Paper Explained - ACCEL: Evolving Curricula with Regret-Based Environment Design (Video Walkthrough)
    https://youtu.be/povBDxUn1VQ Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods. ​ OUTLINE: 0:00 - Intro & Demonstration 3:50 - Paper overview 5:20 - The ACCEL algorithm 15:25 - Looking at the pseudocode 23:10 - Approximating regret 33:45 - Experimental results 40:00 - Discussion & Comments ​ Website: https://accelagent.github.io Paper: https://arxiv.org/abs/2203.01302 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [D] Parameter Efficiency without Computational Efficiency
    Hello hivemind. Context I am working on streamlined design strategies for (manual) neural architecture design. I recently came across a simple, receptive field-based strategy that allows me to reliably improve the architectures of some SOTA models like EfficientNet by up +1.5% top1-accuracy.That being said, there seems to be a trade-off between performance and number of computations I put into the same number of parameters.Basically, the more computationally expensive the architectural change is, the better the performance turns out to be. The number of parameters does not change in the process. The model becomes therefore more parameter efficient but computationally less efficient, which you could consider a somewhat Pyrrhic victory. Now to my question: Is there use for parameter efficiency in models when it does not also coincide with computational efficiency? Is there any literature on the topic you can recommend? submitted by /u/KrakenInAJar [link] [comments]  ( 1 min )
    [D] Calculating feature importance: which model/training set to use in the case of cross validation?
    So far I've only seen feature importance calculated on one model/ training set at a time, However, with cross validation there are many models built on different training sets, how do you calculate feature importance in this case? Just do it on one model? Or calculate feature importance for each model and somehow aggregate it (e.g., by averaging)? submitted by /u/Comprehensive-Egg707 [link] [comments]  ( 2 min )
    [R] Recent Trends In Diffusion-Based Text-Conditional Image Synthesis
    Hello, I wrote the blog post about text-conditional image generation using diffusion models (including DALLE-2). Let me know what you think! ​ https://sangyun884.github.io/recent-trends-in-diffusion-based-text-conditional/ submitted by /u/Impressive-Mirror430 [link] [comments]
    [N][P] Use GitHub Actions for ML with DagsHub Connect
    Hey r/MachineLearning, Nir from DagsHub here, and I'm thrilled to share a project we're launching today that will hopefully unlock GitHub Actions for ML. GitHub Actions solved a DevOps burden many of us felt by providing an easy-to-configure CI/CD tool to build, test, and deploy pipelines. However, when it comes to ML pipelines, and working with data, models, and experimentation in mind, the workflow is not as well defined and can get tricky to implement. DagsHub is kind of like GitHub for machine learning, which extends what GitHub did for code management to track data, models, experiments, and data pipelines. We do this by integrating awesome open source tools like DVC, MLflow, Label Studio, and more. One of our most requested features was a deeper integration with GitHub that will let…  ( 1 min )
    [R] ICLR 2022 Blog Post: The 37 Implementation Details of Proximal Policy Optimization
    Hi folks, our ICLR 2022 Blog post on "The 37 Implementation Details of Proximal Policy Optimization" is live 😀 Our post makes it easier to understand the nitty-gritty PPO's implementations with 1) 🎥 video tutorials 2) 📜 detailed references and explanations 3) ⌨️ really simple code Here are the links: Official URL: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Twitter thread: https://twitter.com/vwxyzjn/status/1518589115163369472 OpenReview link:https://openreview.net/forum?id=Hl6jCqIp2j GitHub repo: https://github.com/vwxyzjn/ppo-implementation-details YouTube tutorial on PPO: https://www.youtube.com/playlist?list=PLD80i8An1OEHhcxclwq8jOMam0m0M9dQ_ I am the main author & feel free to ask me anything here. submitted by /u/vwxyzjn [link] [comments]  ( 1 min )
    [N] Learn how the open-source ecosystem can be used in your machine learning and data science classes.
    Hey, 🤗 Hugging Face is offering a workshop (June 6) for instructors of machine learning and data science who would like to learn how the open-source ecosystem can be used in their classes. After this workshop, you will know how to: 🧑‍💻 Teach Transformers models & famous ML libraries 🤖 Onboard students to the Hub to build/host projects 💾 Publish models/datasets in a few lines of code During the workshop, you will be invited to join the following page for a better understanding of our open-source solutions: https://huggingface.co/teach For more details about the workshop content, visit: https://hf.co/teaching Feel free to register here:) submitted by /u/VioletteLep [link] [comments]  ( 1 min )
    [D] Does anyone know a large varied image dataset that do NOT contain humans?
    It could be pictures of anything, except of humans. But it would be better if it were not focused on a single topic like dog images. submitted by /u/TheManveru [link] [comments]  ( 1 min )
    [R] A new dataset and a library that you can use for ML and RL over the Web
    TL;DR: Download dataset of labelled Web pages, WebTraversalLibrary for scripting web interactions Hi everyone! Our group at Klarna has been putting in a ton of work into deep learning for the Web over the past few years and we've made a couple of useful resources available for the research community. You might find them interesting if you're looking for new ideas for spare-time or even post-grad research projects. We've open-sourced a dataset of about 50k labeled product web pages from roughly 8000 distinct e-commerce merchants, available as MHTML and WebTraversalLibrary clones (see next point :) ), along with the corresponding screenshots. Not all of the MHTMLs render correctly, but the ones that do also have screenshots in a corresponding dataset for CV applications. You can find doc…  ( 2 min )
    [D] Is anyone working on open-sourcing Dall-E 2?
    Just like Eleuther did with GPT3? submitted by /u/invertedpassion [link] [comments]  ( 2 min )
    [P] Demo of Google's new PHORUM Image➠3D Figure Project
    submitted by /u/NichodonARG [link] [comments]
  • Open

    Chatbots
    Does anyone know where chatbots specifically the ones on Chai get their information? This one knew what meta-analysis was and knew who the main character of a show was I do not believe the bots are real people based on both the information I've seen and my experience with them glitching out Plus occam's razor yk Im guessing they can use google or something but it's odd because they clearly don't have access to the time submitted by /u/Shadowfax42- [link] [comments]  ( 1 min )
    Arcane Style Transfer
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    The most basic AI scheme.
    submitted by /u/idvlknv [link] [comments]
    Nvidia AI Designs GPU 3,600x Faster | Breakthrough MRKL NLP Techniques | AI Mind To Predict Dementia
    submitted by /u/getrich_or_diemining [link] [comments]
    Is there a AI that is able to turn normal pictures in that kind of pictures or that I am able to plott them with my pen-plotter?
    submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 1 min )
    What are cool AI tools which I could use to edit images/videos?
    (I have no experience with photoshop) submitted by /u/xXNOdrugsForMEXx [link] [comments]
    Are You The Asshole? New AI Mimics Infamous Advice Subreddit
    submitted by /u/Emmanuel_T_Goldstein [link] [comments]
    Bias in Artificial Intelligence; Is Diversity the Key to the Future Of AI?
    submitted by /u/JencyJane [link] [comments]
    Ai Becomes Sentient 3
    submitted by /u/webauteur [link] [comments]
    To be AI-first, do AI last
    submitted by /u/bendee983 [link] [comments]
    GPT-3 not available in my country
    is there a way to access any of openAI APIs if it's not avalable in my country? submitted by /u/dogaryy [link] [comments]  ( 1 min )
    Text Summarization with Huggingface Transformers and Python
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Reface app deepfake technology
    submitted by /u/Aggravating-Deal-260 [link] [comments]
    Demo of Google's new PHORUM Image→3D Figure Project
    submitted by /u/NichodonARG [link] [comments]
  • Open

    AI Hitman Learns to Find Waldo
    submitted by /u/TernaryJimbo [link] [comments]
    Help with walk forward validation in LSTM problem
    Hi, I am having some trouble with a LSTM problem regarding walk forward validation in my LSTM. The problem is described in this stackoverflow post: https://stackoverflow.com/questions/71990833/using-predictions-instead-of-observed-values-in-walk-forward-validation-in-lstm If any one could help me that would be much appreciated submitted by /u/magnussendjoko [link] [comments]  ( 1 min )
    I don't think this was posted here before but it's incredible. The latest image generation from text
    submitted by /u/gwtkof [link] [comments]
    NN from Scratch: #5 Updating parameters | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    Text Summarization with Huggingface Transformers and Python
    submitted by /u/RubiksCodeNMZ [link] [comments]
  • Open

    Host Hugging Face transformer models using Amazon SageMaker Serverless Inference
    The last few years have seen rapid growth in the field of natural language processing (NLP) using transformer deep learning architectures. With its Transformers open-source library and machine learning (ML) platform, Hugging Face makes transfer learning and the latest transformer models accessible to the global AI community. This can reduce the time needed for data […]  ( 8 min )
    How Nordic Aviation Capital uses Amazon Rekognition to streamline operations and save up to EUR200,000 annually
    Nordic Aviation Capital (NAC) is the industry’s leading regional aircraft lessor, serving almost 70 airlines in approximately 45 countries worldwide. In 2021, NAC turned to AWS to help it use artificial intelligence (AI) to further improve its leasing operations and reduce its reliance on manual labor. With Amazon Rekognition Custom Labels, NAC built an AI […]  ( 5 min )
  • Open

    Estimating the informativeness of data
    MIT researchers can now estimate how much information data are likely to contain, in a more accurate and scalable way than previous methods.  ( 6 min )
    An easier way to teach robots new skills
    Researchers have developed a technique that enables a robot to learn a new pick-and-place task with only a handful of human demonstrations.  ( 7 min )
  • Open

    SimpleGrid env for OpenAI gym
    SimpleGrid is a simple gridworld environment for OpenAI gym. It is easy to use and customise and it is intended to offer an environment for quick testing and prototyping different RL algorithms. I developed this environment by taking inspiration from the FrozenLake environment and gym-minigrid. Check it out at: https://github.com/damat-le/gym-simplegrid ​ https://i.redd.it/prqd7muujqv81.gif submitted by /u/damat-le [link] [comments]  ( 1 min )
    Hybrid CPU topology impact on training
    Hi, some newer CPU's (Apple M1, Intel Alder Lake) have started using a hybrid CPU topology. I.e. the CPU consists of some P-cores (high performance cores) and E-cores (slower 'eco' cores). I personally do not own such a CPU yet but I'm considering upgrading to one soon, so I am looking for experiences of people with such a CPU on training reinforcement models. Does everything work as expected? Are there annoyances? I'm using Ray Tune + RLlib in which I assign a single core per trial. In this case I'm expecting that trials running on E-cores will simply run (much) slower than those running on P-cores. In case a single trial gets assigned both a P-core and an E-core, I do expect a serious slow-down. These are all guesses really, so I'm looking for people with actual experience here. Thanks. submitted by /u/katsu9 [link] [comments]  ( 1 min )
    Policy Iteration on OpenAI Gym taxi-v3
    Hey everyone, I managed to implement the policy iteration from Sutton & Barto, 2018 on the FrozenLake-v1 and wanted to do the same now Taxi-v3 environment. My code has been running now for 45min so I guess there is something wrong, but I can't wrap my head around what it could be. Would appreciate some input on what I need to change so that it will work. Please see my code here: ```[python] import gym # openAi gym import torch import matplotlib.pyplot as plt from tqdm import trange # progressbar torch.manual_seed(4) env = gym.make('Taxi-v3') def policy_evaluation(env: gym.Env, policy: torch.Tensor, gamma: float, threshold: float): V = torch.zeros(env.observation_space.n) delta = float("inf") while delta >= threshold: V_tmp = torch.empty(env.observation_space.n) for state in range(…  ( 1 min )
    ICLR 2022 Blog Post: The 37 Implementation Details of Proximal Policy Optimization
    submitted by /u/vwxyzjn [link] [comments]  ( 1 min )
    Deep Reinforcement Learning Free Class by Hugging Face 🤗
    Hey there! We're happy to announce the launch of the Hugging Face Deep Reinforcement Learning class! 🤗 👉 Register here https://forms.gle/oXAeRgLW4qZvUZeu9 In this free course, you will: 📖 Study Deep Reinforcement Learning in theory and practice. 🧑‍💻 Learn to use famous Deep RL libraries such as Stable Baselines3, RL Baselines3 Zoo, and RLlib. 🤖 Train agents in unique environments with SnowballFight, Huggy the Doggo 🐶, and classical ones such as Space Invaders and PyBullet. 💾 Publish your trained agents in one line of code to the Hub. But also download powerful agents from the community. 🏆 Participate in challenges where you will evaluate your agents against other teams. 🖌️🎨 Learn to share your environments made with Unity and Godot. 👉 Register here https://forms.gle/oXAeRgLW4qZvUZeu9 📚 The syllabus: https://github.com/huggingface/deep-rl-class https://preview.redd.it/b409a9sscov81.jpg?width=1920&format=pjpg&auto=webp&s=ebdfa10c220b5a3dec17894bc0f955ed9d8f7634 If you have questions and feedback, I would love to answer them, Thanks, submitted by /u/cranthir_ [link] [comments]  ( 1 min )
    Has anyone solved deepmind control locomotion in state-based?
    Hi, I'm wondering if I could train the state-based locomotion well from the scratch. In my case, I trained SAC agent and It showed divergence of the q function. I think the cause is the unscaled vector input. Before the trouble shooting, I want to ask someone trained agent in this environment. Thank you for reading. https://github.com/deepmind/dm_control/tree/main/dm_control/locomotion submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Can't solve OpenAI problems
    I started off with OpenAI's mountain car, and I don't know where to even start. I got the setup working with the environment, but now have no idea how to train it. How do I learn to code for RL? Every tutorial I have seen so far has implemented Q Learning algorithms completely with little explanation. I looked at the solved code and it doesn't make sense to me. How should I prepare before I go into OpenAI? submitted by /u/TrepidationTD [link] [comments]  ( 1 min )
  • Open

    Google at ICLR 2022
    Posted by Cat Armato and Callan Hajosy, Program Managers The 10th International Conference on Learning Representations (ICLR 2022) kicks off this week, bringing together researchers, entrepreneurs, engineers and students alike to discuss and explore the rapidly advancing field of deep learning. Entirely virtual this year, ICLR 2022 offers conference and workshop tracks that present some of the latest research in deep learning and its applications to areas ranging from computer vision, speech recognition and text understanding to robotics, computational biology, and more. As a Platinum Sponsor of ICLR 2022 and Champion DEI Action Fund contributor, Google will have a robust presence with nearly 100 accepted publications and extensive participation on organizing committees and in workshops…  ( 12 min )
  • Open

    Projective duality
    The previous post explained how to define a projective plane over a field F. Now let’s look at how we do geometry in a projective plane. Definitions We have a definition of points from the other post: a point is a triple (a, b, c) of elements of F, with not all elements equal to […] Projective duality first appeared on John D. Cook.  ( 3 min )
    Finite projective planes
    Given a field F, finite or infinite, you can construct a projective plane over F by starting with pairs of elements of F and adding “points at infinity,” one point for each direction. Motivation: Bézout’s theorem A few days ago I mentioned Bézout’s theorem as an example of a simple theorem that rests on complex […] Finite projective planes first appeared on John D. Cook.  ( 4 min )
  • Open

    PPE: A fast and provably efficient RL algorithm for exogenous noise
    Picture a person walking in a park by a pond. The surrounding environment contains a number of moving objects that change the quality of the environment: clouds moving to hide the sun, altering the quality of light; ducks gliding across the pond, causing its surface to ripple; people walking along a path, their images reflecting […] The post PPE: A fast and provably efficient RL algorithm for exogenous noise appeared first on Microsoft Research.  ( 9 min )
  • Open

    Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop
    When brainstorming a scene to best showcase the groundbreaking capabilities of the Omniverse platform, some NVIDIA artists turned to a cherished memory: enjoying ramen together in a mom-and-pop shop down a side street in Tokyo. Simmering pots of noodles, steaming dumplings, buzzing kitchen appliances, warm ambient lighting and glistening black ledger stools. These were all Read article > The post Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop appeared first on NVIDIA Blog.  ( 4 min )
    Stellar Weather: Researchers Describe the Skies of Exoplanets
    A paper released today describes in the greatest detail to date the atmospheres on distant planets. Seeking the origins of what’s in and beyond the Milky Way, researchers surveyed 25 exoplanets, bodies that orbit stars far beyond our solar system. Specifically, they studied hot Jupiters, the largest and thus easiest to detect exoplanets, many sweltering Read article > The post Stellar Weather: Researchers Describe the Skies of Exoplanets appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Let’s Talk about Machine Translation: The powering engine behind “Google Translate”
    At some point, we’ve all used Google translate, Microsoft,DeepL or Bing translator to impress our friends/colleagues who speak a different…  ( 4 min )
  • Open

    Multiprocessing in Python
    When you work on a computer vision project, you probably need to preprocess a lot of image data. This is time-consuming, and it would be great if you could process multiple images in parallel. Multiprocessing is the ability of a system to run multiple processors at one time. If you had a computer with a […] The post Multiprocessing in Python appeared first on Machine Learning Mastery.  ( 9 min )
  • Open

    Should I Use Offline RL or Imitation Learning?
    Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches. Offline reinforcement learning allows learning policies from previously collected data, which has profound implications for applying RL in domains where running trial-and-error learning is impractical or dangerous, such as safety-critical settings like autonomous driving or medical treatment planning. In such scenarios, online exploration is simply too risky, but offline RL methods can learn effective policies from logged data collected by humans or heuristically designed controllers. Prior learning-based control methods have also approached learning from existing data as imitation learning: if the data is generally “goo…  ( 11 min )

  • Open

    [D] Legality of Hosting ImageNet
    Despite it's immense popularity in academia, it's surprisingly difficult to download the ImageNet Object Localization dataset. As far as I can tell this is due to legal issues -- no single entity owns the copyright to the images, so no entity can host the whole dataset. The result is that if you want to use ImageNet you're forced to either manually scrape a million URLs (requiring both cpu time, your time, and imposing costs on a million unsuspecting websites), know somebody who has already done that, or fetch it from a legally questionable source. So I have a couple questions: Is the owner of the ImageNet dataset on Kaggle performing a selfless public service, thanklessly accepting legal liability to make ImageNet more accessible? Or is she protected (e.g. by fair use)? Is she required to accept DMCA requests? If I'd like to share ImageNet (I've recently downloaded it and processed it to be ~10 GB, which seems like a helpful thing to share), is there any legally safe path for me to do this? submitted by /u/you-get-an-upvote [link] [comments]  ( 3 min )
    [D] Machine Learning - WAYR (What Are You Reading) - Week 136
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 12 Week 22 Week 32 Week 42 Week 52 Week 62 Week 72 Week 82 Week 92 Week 102 Week 112 Week 122 Week 132 Week 3 Week 13 Week 23 Week 33 Week 43 Week 53 Week 63 Week 73 Week 83 Week 93 Week 103 Week 113 Week 123 Week 133 Week 4 Week 14 Week 24 Week 34 Week 44 Week 54 Week 64 Week 74 Week 84 Week 94 Week 104 Week 114 Week 124 Week 134 Week 5 Week 15 Week 25 Week 35 Week 45 Week 55 Week 65 Week 75 Week 85 Week 95 Week 105 Week 115 Week 125 Week 135 Week 6 Week 16 Week 26 Week 36 Week 46 Week 56 Week 66 Week 76 Week 86 Week 96 Week 106 Week 116 Week 126 Week 7 Week 17 Week 27 Week 37 Week 47 Week 57 Week 67 Week 77 Week 87 Week 97 Week 107 Week 117 Week 127 Week 8 Week 18 Week 28 Week 38 Week 48 Week 58 Week 68 Week 78 Week 88 Week 98 Week 108 Week 118 Week 128 Week 9 Week 19 Week 29 Week 39 Week 49 Week 59 Week 69 Week 79 Week 89 Week 99 Week 109 Week 119 Week 129 Week 10 Week 20 Week 30 Week 40 Week 50 Week 60 Week 70 Week 80 Week 90 Week 100 Week 110 Week 120 Week 130 Most upvoted papers two weeks ago: /u/CatalyzeX_code_bot: Paper link /u/lauren_v2: paper Besides that, there are no rules, have fun. submitted by /u/ML_WAYR_bot [link] [comments]  ( 1 min )
    [P] Showcase your Machine Learning Research/Projects in Hugging Face Spaces using Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [D] How is NVIDIA P100 on Google Colab Pro compared to Laptop with RTX3080 (Mobile, or Max-Q) ?
    submitted by /u/aviisu [link] [comments]  ( 6 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 2 min )
    [D] What is the best optimizer to use when visualization inter-net neurons by optimizing random input in relation to it?
    Hi, seemingly it's become a staple in conv net inner-working visualization to put a network in eval mode, sample a random noise image, and optimize the image in relation to the activation of some internal neuron. From what I saw, most examples of this on the internet are using an Adam optimizer for this with a learning rate of 0.1 and a weight decay of 1e-6. This doesn't seem quite right with me, So if any of you know what's the source for this convention and if there are other alternatives I'd appreciate this information very much. Thanks! submitted by /u/ondrea_luciduma [link] [comments]  ( 1 min )
  • Open

    Projects to jump into RL?
    I have some experience with ML but not at all with RL. I know basic theory and want to just get started with the programming part. I saw OpenAI's gym, but I want to learn RL that can be applied anywhere. I don't want to be specifically constrained to OpenAI's gym and want something where I can apply it to game development, such as Unity. Are there any good resources to just do an RL project? submitted by /u/TrepidationTD [link] [comments]  ( 1 min )
    N Step Prioritized Replay Buffer
    I have a few questions about implementing the N Step version of Prioritized Replay Buffer (for Rainbow DQN). I'm implementing the Atari version of this buffer. To conserve memory, I'm only storing the states (and the last state of each episode) in an unstacked manner. That is, if the frame stack is 4, and the shape of states returned by the environment is 4, I'm storing only the last state of the stacked states. This way the buffer contains all transitions from each episode. As for the N Steps part, I'm only calculating the n_step states when getting states from the buffer instead of storing the N Step transitions directly. For the prioritized version, how do the priorities work? If I wasn't trying to conserve memory, I would have stored the N Step stacked states directly and update the priorities for the segment trees and the segment tree pointers only when moving the data from the N Step buffer to the main buffer. But now that I'm calculating the N Step experience directly when sampling and not when adding data to the buffer, when do I update the priorities and the tree pointer? Once per state? Once per stacked state? Once per N Step stacked state? When I update the priorities for the sampled batch, the priorities are associated with multiple states (because the states are stacked). But because I don't store the stacked states and only the raw unstacked states, to which of these states should I update the priority for? And because I don't store the n step transitions anymore, to which of the N Steps should the priorities be associated with? If I want to create such a buffer for a vector env, how would I go about doing it? I'm thinking of maintaining a separate segment tree for each env. Is that correct? Is there a better way? submitted by /u/SirRantcelot [link] [comments]  ( 1 min )
    Confused between "centralized critic" and "centralized training decentralized execution"
    I don't understand if having a centralized critic in multi-agent RL is the same as having a centralized training decentralized execution approach. Can you help me clarify this? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    There's a "Yo Mama" joke in their somewhere!!
    submitted by /u/boss_007 [link] [comments]  ( 1 min )
    _Algorithms for Decision Making_, Kochenderfer et al 2022 (textbook draft; more classical ML than S&B)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Solving a large (dynamic) maze using DQN (reward & observation)
    Consider a fixed maze with size (100x400) and fixed start and endpoints (1,1) and (99,399), where doors dynamically appear around the agent every time an observation has been made. The doors disapear after some time step n (for simplicity say n=1). For pitty sake lets say the optimum path length is 1000 steps and there is only one way to reach it. I have two questions: (1) What would be the most appropriate way to frame reward function? I have tested both volatile aproach (i.e. episodeis terminated if agent steps into wall or previosuly made path, otherwise is rewarded in "cheese" every 5 steps) and not-so-volatile approach (i.e. free roam for maximum 2000 steps with cheese reward every 5 steps). Evidently these do not work, are there other perhaps more promising approaches or such large mazes? (2) Does the colour of objects in a frame matter to the (convolutional) DQN's learning of the maze. Considering a 3 channel RGB input, is there perhaps a smarter way to colour code the walls/emptypath/etc.? Its a bit of a weird question but my curiosity arises from short-term memory examples (namely, space invaders) where the input, multiple greyscale frames, is chosen over one rgb frame - so to this degree channels can be understood as higher-level "features" of a frame, hence, can the colours of different objects be optimised? If so is there a logical way of viewing such optimisation? Any other advice is always welcome :) ​ N.B. yes, this could probably be solved with other easier methods but lets just say DQN (or other deep-Q alternatives) is needed. submitted by /u/Background-Cable-491 [link] [comments]  ( 2 min )
    RL for classification problems
    Hello, I aim to use RL in for classification problem, But I can't see where is the difference between using RL and other ML algorithms that are used in classification (such as MLP, KNN, SVM ..) since we have a train phase in which we teach an agent ( or ML algorithm) the classe of each sample of a labeled dataset. Ok the manner of teching is different but the concept is the same. Then, in a second phase, we test the model with a test set. My question is, if I choose RL for classification problem, what is the contribution that we can have compared to another algorithm? submitted by /u/fatenLouati [link] [comments]  ( 2 min )
    Design of next observation when collision for 2d continuous maze
    Hi, I am trying to create a continuous 2d maze environment. I tried several algorithms but no one can give me a stable 1 success rate. From time to time, it gets stuck around the obstacle corner and fails to move anymore. Like the picture shows below. I guess it relates to my bad design for giving the next observation for an action that can't make the agent move forward. Currently, my design evenly separates the action into 10 substeps, and returns the observation which corresponds to the step right before the agent meets an obstacle. It looks like I won't have this issue for mujoco environment like Ant. But it is really hard to see their design. https://preview.redd.it/swohj8h8qev81.png?width=282&format=png&auto=webp&s=7353a62e99ff2141b3660c68ae34ce4f5cd9b94f submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    Why can't we make a perfect AI for Starcraft through evolution
    First of all, let's discuss what the level of AI is now. If the "level" refers to the capability of competing, the current AI has been very closed to the top human player in some types of games, like chess, Texas Poker, and Mahjong of CARDS, DOTA2 of MOBA, as well as StarCraft2 of RTS. As for other games, if we have enough human resources and computing performance, we also can get similar results. If the "level" has other meanings, like AI agents having human behavior, intelligent NPC can be designed specifically for different people so that they can have different gaming experience. These are all at the stage of issue-defining and exploring new technology solutions. Although traditional game AI is mostly based on hard code, it still has much prior knowledge. In recent years, some hot ML-r…  ( 4 min )
  • Open

    Interactive Course on Optimizing Search Engines With Ricardo Baeza-Yates Starting May 10
    Sponsored Post Search systems are in the process of being revolutionized by Deep Learning and AI applications. To successfully evaluate, build, deploy and scale information retrieval systems, engineers working with search systems must understand the frameworks and algorithms that underpin this technology. Professors Ricardo Baeza-Yates (Northeastern University) has done research on information retrieval and web […] The post Interactive Course on Optimizing Search Engines With Ricardo Baeza-Yates Starting May 10 appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    India and US have decided to advance cooperation in emerging technologies in the fields of communication, artificial intelligence
    submitted by /u/dannylenwinn [link] [comments]  ( 1 min )
    Research on AI!
    Dear all, Currently I am writing my thesis on the effects of Artificial Intelligence (AI) on employee performance through employee engagement. If you work in a company that uses AI and you: - have a direct relationship (e.g. data scientist) or indirect relationship (e.g. business manager) with AI - often or sometimes use AI in your daily work then your insights are essential for my master thesis research and I would really appreciate it if you would fill out the survey below (5-10 min / English & Dutch translation). https://uva.fra1.qualtrics.com/jfe/form/SV_4TTnzJE4IQUoHWK Please feel free to spread the survey to as many relevant people you may know. Much appreciated!! Britt submitted by /u/BrittHermans [link] [comments]  ( 1 min )
    AI Dream 44 - Epic Cathedral Supernatural Visit
    submitted by /u/LordPewPew777 [link] [comments]
    In the deep
    submitted by /u/Hacknaut [link] [comments]
    Nota AI Introduces New Machine Learning Tools Under Its NetsPresso Platform For Automatically Searching Optimized Models And Making Compression Process Easy And Fast
    ​ https://preview.redd.it/beo6czc6hhv81.png?width=1706&format=png&auto=webp&s=385f65ebfed9344781d1f9238d8d508dae27c0a3 In the last decade, AI research has brought astonishing results in many fields, and, undoubtedly, AI is nowadays a central technology in many aspects of our life. As new ideas are proposed every day, this continuous research usually comes with infinite applications: from the algorithms assisting surgeons in complex operations to the one which allows unlocking our phone using just our face. In this evolution from the idea to the actual implementation, it is often ignored how hard the passage between theoretical research and working application is. We can refer to this process as AI Development Cycle for Edge AI and can be divided into three phases related to 1) data, 2) model, and 3) evaluation. Many aspects must be considered: first, each different AI application requires a specific dataset. For this reason, in this step, the aim is to prepare the data, which, as is well known, is one of the crucial topics of AI: a good algorithm always relies on a good dataset. This phase can be divided into data collection, curation, labeling, and preparation. Continue reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    [D] MindSpore AI Scientific Computing Series (15): Protein Function Prediction
    submitted by /u/Creative_Habit_6868 [link] [comments]
    Training an AI Hitman To Find Waldo
    submitted by /u/TernaryJimbo [link] [comments]
    Kanye wes t thro the wire
    ​ https://preview.redd.it/qaqevbqlfev81.png?width=1024&format=png&auto=webp&s=8c2759d29a3713c2ffb998e20e8995cee4d8b2b4 submitted by /u/Smek_dev [link] [comments]
  • Open

    Nota AI Introduces New Machine Learning Tools Under Its NetsPresso Platform For Automatically Searching Optimized Models And Making Compression Process Easy And Fast
    In the last decade, AI research has brought astonishing results in many fields, and, undoubtedly, AI is nowadays a central technology in many aspects of our life. As new ideas are proposed every day, this continuous research usually comes with infinite applications: from the algorithms assisting surgeons in complex operations to the one which allows unlocking our phone using just our face. In this evolution from the idea to the actual implementation, it is often ignored how hard the passage between theoretical research and working application is. We can refer to this process as AI Development Cycle for Edge AI and can be divided into three phases related to 1) data, 2) model, and 3) evaluation. Many aspects must be considered: first, each different AI application requires a specific dataset. For this reason, in this step, the aim is to prepare the data, which, as is well known, is one of the crucial topics of AI: a good algorithm always relies on a good dataset. This phase can be divided into data collection, curation, labeling, and preparation. Continue reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )

  • Open

    Time Series Analysis for air pollution data not aligned [R] [P]
    This is about a project that I am working on; Hope the ML community can help me! I have collected few hours of air pollutants data using Aeroqual sensors and custom made sensors. 3 types of data is available in the project; aeroqual, custom, council data. Where council data can be taken for granted (It comes from the govt installed high spec sensor). Aeroqual is a commercial sensor manufacturing company, its data should be accurate. The first part of the project is about checking the accuracy of custom sensor. So, I have done few analysis on the data; and found that custom sensor data has similarity (but not same, there are so much variation in the custom sensor data) with council sensor data but aeroqual data is way different. I am attaching the plot below which I have done. ​ So I need to know is there any method that I can find relationship between these three datasets? Is it possible to make these data align togather? I need to build an ML model to predict the air pollutant level using this data. any tips for getting this thing working? - Thanks in advance ​ https://preview.redd.it/fzu7dbsz0dv81.png?width=885&format=png&auto=webp&s=516d65fe3290ac8a28159547880f9dc972922b64 submitted by /u/Codename_17 [link] [comments]  ( 1 min )
    [D] For training a HAAR cascade is it better to manually remove noise from positive training images or to leave it in so the data is more realistic?
    submitted by /u/Counter-Business [link] [comments]  ( 1 min )
    [D] How to convert papers to code?
    My problem is probably what you have guessed: it's understanding the technical specifications which are usually written in a non-coding-friendly way. Sometimes crucial information is completely missing from the paper ex: loss function description for a DL algorithm. For the lucky cases where there are already available implementations on github to a given paper, usually they are either very distinguishable from each other in terms of code structure which questions their validity or whether they match what the paper authors intended specially with varying measurable results, or they are almost exact copies from one another. There are numerous examples where I can show specific papers with varying degrees of complexity, and discuss why the conversion can be tricky but they may require standalone discussions themselves, likely outside the scope of this one. Is there a way to approach the problem assuming the absence of reference code? submitted by /u/shine-box [link] [comments]  ( 2 min )
    Open Source Model For Identifying Extremism Online [Project]
    submitted by /u/OppositeMonday [link] [comments]
    [P] Tired of manually sending minutes of meeting
    I host an important org level meeting (~100 attendees) every week, and need to share minutes after the meeting. I am so tired of listening to conversations again just to capture important points, summarise discussion and action items. Is there any model/api which can help me do that? I use Amazon transcribe to generate transcripts, which helps, but it is not very accurate. For me the priority would be: 1) Model/api which is better than Amazon transcribe 2) Auto Identify speakers / speaker diarization (since mostly the same set of people speak) 3) Summarise the conversations into topics (we have time and agenda based discussion) I am sure this might be a problem across the industry since most of the meetings happen online, and someone wastes hours after meeting to send notes. I did find some tools which summarise the transcript, but i need to auto send in a specific format and identify topics based on conversion (maybe we can input the agenda in advance). Also this is private information, so I need something on premise, hence looking for a repo or model which i can use to build something on top. Please let me know if something exists or someone working on similar projects. Happy to collaborate and contribute. submitted by /u/super_commando-dhruv [link] [comments]  ( 1 min )
    [R] ?? Can you find out which news article is written by AI ??
    This research will test the human ability to distinguish human written text from text generated by artificial intelligence. Participating will only take 10 minutes. You will receive 2 short news articles about the same topic. One will be written by a human, the other one will be generated by artificial intelligence. It is up to you to find out which one is written by artificial intelligence. You will be asked to do this for four different subjects, namely: Science, Economics & Politics, Society and Sports. At the end of the survey you will receive feedback on how well you have performed. The human written articles were collected from various news websites. The Articles created by artificial intelligence were generated using GPT-3 from OpenAI. Purpose of the research: We are trying to find out how well GPT-3 performs across subjects. Are there any subject GPT-3 is better at writing about, or is he equally good across all subjects. Secondly we are testing the ability of GPT-3 to generate articles about events that happened after the training of the model. You can participate by clicking on the link below, thank you very much for your participation. https://vub.fra1.qualtrics.com/jfe/form/SV_b2E9f6hGxNDH13M submitted by /u/RobinSandersVUB [link] [comments]  ( 1 min )
    [R] I need to run >2000 experiments for my PhD work. How much would 2000 GPUs for 1 day cost?
    2000 GPUs and 8000 CPUs. And where could I even get such a vast affordance? submitted by /u/samlerman [link] [comments]  ( 2 min )
    [P] Vectorflow is a minimalist neural network library optimized for sparse data and single machine environments open sourced by Netflix
    submitted by /u/ur_mum_goes_to_uni [link] [comments]
    [Project] Face detection algorithms comparison
    I selected 5 ready-made algorithms for face detection and compared them with each other by such metrics as Precision, Recall, IOU and time on the dataset I marked up. I am ready to accept your Pull Request with your solutions(algorithms) and results! GitHub: https://github.com/wb-08/face-detection-algorithms-comparison Blog post: https://habr.com/ru/post/661671/ submitted by /u/wb-08 [link] [comments]
    [Discussion] Writing production grade code for ML in python
    I have been interviewing for a machine learning lead position. I have successfully passed 3 interview rounds (coding , HR, system design). I have my final interview with the VP of Engineering. When asked how best to prepare myself, they said they would like to test my ability to write "production quality" code in python. While I do have some experience, the downside is I worked in small R&D teams for a long time. Though I am knowledgeable in python, perhaps, I might have not followed all the industry best practices. If you are a hiring manager or interviewer, how would you test this ability? How do I prepare myself to prove my ability to write production grade code? Thank you all so much in advance. submitted by /u/mbkv [link] [comments]  ( 4 min )
    [D] Comparing the efficiency of different GAN models
    I'm comparing different GAN models (CGan, DCGan, WGan, StyleGan) in tensorflow2. In general, I want to use the images that I generate with the generator to train a classifier while being as realistic as possible. At first, I wanted to let them train for 24 hours each, define some early stopping criteria and save the checkpoints with the lowest loss through a callback. But it seems that the lower loss does not always lead to more realistic images. So how do I compare the different models in a scientific way? Because the results highly depend on the epoch I choose and my subjective feeling, which images look the best. submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P], Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    https://www.youtube.com/watch?v=2E_6ARbrMmc submitted by /u/Thenamessd [link] [comments]
    [N] Google's new AI image analysis is pretty LiT - and beats OpenAI's CLIP
    submitted by /u/much_successes [link] [comments]
    [P] A Simpler @PyTorch Annotated Implementation of EleutherAI's 20B Language Model GPT-NeoX.
    Github: https://github.com/labmlai/neox Annotated implementation: https://lit.labml.ai/github/labmlai/neox/tree/main/src/neox/__init__.py Original repo from EleutherAI: https://github.com/EleutherAI/gpt-neox We have included samples showing how to generate text and to fine-tune. We haven't included a bunch of optimizations that were present in original GPT-NeoX to keep things simple. submitted by /u/hnipun [link] [comments]  ( 1 min )
    [P] treequeues: transfert jax pytrees between processes with very high speed!
    Hello! If you are using jax and you need to pass some pytrees between processes, I may have something for you :) I developed a "treequeue". It is a queue that is made for pytree's nested arrays. The transfer speed is up to 10 times higher than regular queues. This is done by utilizing shared memory arrays and avoiding pickling data. This can be very useful when developing distributed architecture, e.g. distributed reinforcement learning where speed is at the upmost importance. In my case this implementation was very useful to remove bottlenecks when implementing PBT algorithms! https://github.com/thomashirtz/treequeues Cheers! submitted by /u/krenast [link] [comments]  ( 1 min )
    [D] ‘auton-survival’ package for deep survival analysis and time to event regression from CMU.
    Comes with ‘white paper’ and example notebooks… seems legit..? Anyone tried this out yet? Github Paper] submitted by /u/proportional-hazard [link] [comments]
    [P] Unofficial ViT-VQGAN implementation
    I know that many people (including me) were surprised after seeing the image quality of ViT-VQGAN and disappointed to know there won't be no source code released. Therefore, I've decided to implement it by myself and here is the code. I hope this can help everyone as a starting point for ViT-VQGAN. submitted by /u/ThunaClone [link] [comments]
    [R][P] StyleGAN-Human: A Data-Centric Odyssey of Human Generation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Review of end-to-end multi-modal deep learning approach for autonomous navigation
    In reviewing various approaches to end-to-end deep learning for autonomous driving, I've come across an interesting approach in this paper that I would like to discuss with others... I will begin by summarizing the approach: ​ A ResNet50 architecture is used as an encoder network with the input being an RGB image + depth map concatenated as (224 x 224 x 4). In the paper it is argued that a point cloud can also be used, or some other sensor modality would also work The encoder network output (feature map of 7 x 7 x 2048) is fed into a decoder network that takes it back to (224 x 224 x 5) with pixel wise semantic segmentation of 5 classes: lane, road line, sidewalk, vehicles or pedestrians, and others That same encoder output (feature map of 7 x 7 x 2048) is global average pooled to 2…  ( 2 min )
  • Open

    mGPT: Few-Shot Learners Go Multilingual
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Biological feedback will save us all
    Dall-E-2. Excellent. It's very high quality. But it's a combination of the data. ​ What did we want? We wanted some amazing work that made us cry with just one line of writing or one image. ​ "Oh, copied it well. It's pretty much the same." It's not enough. But how can that be improve? I think the answer is the feedback method. ​ ​ The current evaluation method of writing, image, video, and sound is too indirect. ​ Sales revenue Number of Subscribers Number of views Like / Dislike Ratings by section, Revisit Rate <<< Those are better than others Emotion analysis of Comments using AI Internal staff scores ​ There are so many conditions other than the quality of contents that people's judgment can intervene in. In the first place, people don't express exactly what they…  ( 3 min )
    16 images generated for text prompt "Woah there, Dragonman!" using a text-to-image AI model from CompVis that uses latent diffusion (crosspost of another user's post)
    submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    NVIDIA Instant NeRF: Turn Photos into 3D Scenes in Milliseconds ! Video demo
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    MIT's new machine-learning system M2I may someday help driverless cars predict the next moves of others
    submitted by /u/qptbook [link] [comments]
    Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Human Like AI where should i start
    Hello there, if one would want to get into AI and especially human like AI, would you still recommend getting into machine learning first? As far as i know machine learning doesnt even try to develop "human like" AI/"bottom up AI", but rather focuses on training algorythms to solve specific problems. I know human like AI is something thats highly complex and we still need years if not even decades to achieve something even close to it but i would appreciate tips and ideas nonetheless. (after reading through my question again this sounds like a generic question thats being asked here everyday, if thats the case please send me a link to a similar post if there is one :) ) submitted by /u/Garic152 [link] [comments]  ( 1 min )
    help with a project idea
    Hi everyone Im doing a project with my friends where we should use computer vision/iot to create a solution for people with disabilities or in the healthcare system Any ideas please submitted by /u/armyy__ [link] [comments]
    Meta AI Researchers Built An End-To-End Machine Learning Platform Called Looper, With Easy-To-Use APIs For Decision-Making And Feedback Collection
    From improving the user experience to making the computational infrastructure more effective, AI is a crucial aspect of making current software systems and products perform as well as possible. AI is often more effective than even precisely developed human-crafted heuristic tactics today, whether it’s reducing latency, boosting the quality of a video stream, or streamlining the interfaces to match a specific person’s demands. But, to use AI more effectively in various products, several challenges must be addressed: the system must accommodate software engineers without machine learning backgrounds; it must provide mechanisms to optimize for a variety of product goals, which may differ from closed-form machine learning loss functions; it must distinguish causal connections from data correlations; and it must scale efficiently to train, host, and monitor vast numbers of AI models. Meta Researchers Develop ‘Looper,’ an end-to-end AI platform that has been designed with easy-to-use APIs for optimization, personalization, and feedback collecting to answer these needs. Looper may be used to support the entire machine learning lifecycle, from model training to deployment and inference to product evaluation and optimization. Looper allows us to modify the existing products to leverage AI for personalized optimizations rather than having to rebuild them around AI models. Currently, the Looper platform hosts 700 AI models and produces 4 million AI outputs every second. Continue reading Paper: https://arxiv.org/pdf/2110.07554.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Are there any programs that can output a sentence based on input sentences?
    I'm looking to create a way to automate original story ideas based on previous ideas. I want to be able to input 1000+ original sentences and have an output of an original sentence that is inspired the previous ones. Are there any programs that can do this or will I need to develop my own? submitted by /u/yea_okay_dude [link] [comments]  ( 1 min )
    Ultimate Guide to Activation Functions
    submitted by /u/SirFletch [link] [comments]
  • Open

    How to stop stable baseline model during the training exactly at the end of frame?
    I am training PPO2 model on stable-baseline library. I have tabular data with 15000 rows, thus length of the episodes is 15000. I am using nminibatches=4, n_envs=1. For example, I have set total_timesteps=10000. During the training process agent will see 15000 rows several times and updates actions for each rows, but in some particular point, the rest of the time total_timesteps will not be enough to see the full episode, and only part of episodes is available in the last step of learning. To be concrete. For simplicity, lets say we have 10 raws, 23 total_timesteps. The agent will see the full episode 2 times, and only the first 3 rows in the third times and rest of the 7 raws have not seen during last step. I want to stop the learning process when Agent reaches the last time full episodes (above example stop learning at when total_timesteps=20) or define total_timesteps in such a way to see full episodes at the end of the training step. submitted by /u/Mariam_Dundua [link] [comments]  ( 1 min )
    New to RL
    Hello guys, I am pretty new to the rl field and write now i am doing my thesis in it. I've come across a problem in my code. I created a custom environment and when i am trying to solve it with my dqn agent using stable baselines3, I am able to execute the code and print out the required things but the agent is not learning. Any help ? thanks. submitted by /u/last_2_brain_cells97 [link] [comments]  ( 1 min )
    Questions on policy gradients
    Hi guys, I am new to RL and reading tutorial of spinning up which focus on policy based algorithms. In the derivation of VPG, the tutorial said"The environment has no dependence on /theta(the parameter of policy), so gradients of R(/tau)(total return of the trajectory) with respect of /theta is 0. However, the trajectory depends on our policy, and our policy depends on /theta. As a result, I am confused why total return of trajectory is independent from /theta. submitted by /u/SkyRimT [link] [comments]  ( 2 min )
    Vicarious exits: acquihired by Google robotics (Intrinsic) & DeepMind
    submitted by /u/gwern [link] [comments]
  • Open

    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    I don't understand why I am getting NaN loss scores. Can anyone explain what I am doing wrong ?
    submitted by /u/brike3 [link] [comments]  ( 1 min )
    Are there applications of neural networks other than machine learning?
    I see lots of hardware oriented toward AI/ML stuff these days, including chips with hardware acceleration for neural networks. I'm thinking about how GPUs were initially designed for graphics calculations, but then things like CUDA and OpenCL were developed to make that hardware usable for broader applications of parallel processing. Are there any other things that you can do with a neural network besides backpropagation, that wouldn't be easier to do in other ways? submitted by /u/Bananawamajama [link] [comments]  ( 1 min )
  • Open

    My Paper Reviewing Load
    In academia, for better or worse, we have what’s called a peer review system, where papers get accepted to journals, conferences, or other venues on the basis of reviews from other researchers, who ideally are subject area experts and thus are qualified to evaluate the paper. The reviewers also cannot have a conflict of interest with the authors, and should not be overwhelmed with too many papers to review. This is the ideal world, and is not always what happens in practice. From my experience in the robotics academic community (and this may apply to other disciplines), it generally seems like there is no standard definition of an “appropriate” or “maximum” reviewing load for a reviewer. This is difficult to define as different papers mandate different reviewing efforts; a massive journal …  ( 4 min )
  • Open

    A manifold learning approach for gesture recognition from micro-Doppler radar measurements. (arXiv:2110.01670v4 [cs.LG] UPDATED)
    A recent paper (Neural Networks, {\bf 132} (2020), 253-268) introduces a straightforward and simple kernel based approximation for manifold learning that does not require the knowledge of anything about the manifold, except for its dimension. In this paper, we examine how the pointwise error in approximation using least squares optimization based on similarly localized kernels depends upon the data characteristics and deteriorates as one goes away from the training data. The theory is presented with an abstract localized kernel, which can utilize any prior knowledge about the data being located on an unknown sub-manifold of a known manifold. We demonstrate the performance of our approach using a publicly available micro-Doppler data set, and investigate the use of different preprocessing measures, kernels, and manifold dimensions. Specifically, it is shown that the localized kernel introduced in the above mentioned paper when used with PCA components leads to a near-competitive performance to deep neural networks, and offers significant improvements in training speed and memory requirements. To demonstrate the fact that our methods are agnostic to the domain knowledge, we examine the classification problem in a simple video data set.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    Accurate detection of sepsis at ED triage using machine learning with clinical natural language processing. (arXiv:2204.07657v2 [cs.LG] UPDATED)
    Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Accurate detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to determine whether EHR data can be extracted and synthesized with the latest machine learning algorithms (KATE Sepsis) and clinical natural language processing to produce accurate sepsis models, and compare KATE Sepsis performance with existing sepsis screening protocols, such as SIRS and qSOFA. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16 participating hospitals. KATE Sepsis, SIRS, standard screening (SIRS with source of infection) and qSOFA were tested in three settings. Cohort-A was a retrospective analysis on medical records from a single Site 1. Cohort-B was a prospective analysis of Site 1. Cohort-C was a retrospective analysis on Site 1 with 15 additional sites. Across all cohorts, KATE Sepsis demonstrates an AUC of 0.94-0.963 with 73-74.87% TPR and 3.76-7.17% FPR. Standard screening demonstrates an AUC of 0.682-0.726 with 39.39-51.19% TPR and 2.9-6.02% FPR. The qSOFA protocol demonstrates an AUC of 0.544-0.56, with 10.52-13.18% TPR and 1.22-1.68% FPR. For severe sepsis, across all cohorts, KATE Sepsis demonstrates an AUC of 0.935-0.972 with 70-82.26% TPR and 4.64-8.62% FPR. For septic shock, across all cohorts, KATE Sepsis demonstrates an AUC of 0.96-0.981 with 85.71-89.66% TPR and 4.85-8.8% FPR. SIRS, standard screening, and qSOFA demonstrate low AUC and TPR for severe sepsis and septic shock detection. KATE Sepsis provided substantially better sepsis detection performance in triage than commonly used screening protocols.
    Visual Attention Methods in Deep Learning: An In-Depth Survey. (arXiv:2204.07756v2 [cs.CV] UPDATED)
    Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.
    On Distribution Shift in Learning-based Bug Detectors. (arXiv:2204.10049v1 [cs.LG])
    Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite achieving high test accuracy (e.g. >90%), the resulting bug detectors are found to be surprisingly unusable in practice, i.e., <10% precision when used to scan real software repositories. In this work, we argue that this massive performance difference is caused by distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. To address this key challenge, we propose to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. During these two phases, we leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our constructed test set and the latest version of open source repositories.
    Persua: A Visual Interactive System to Enhance the Persuasiveness of Arguments in Online Discussion. (arXiv:2204.07741v2 [cs.HC] UPDATED)
    Persuading people to change their opinions is a common practice in online discussion forums on topics ranging from political campaigns to relationship consultation. Enhancing people's ability to write persuasive arguments could not only practice their critical thinking and reasoning but also contribute to the effectiveness and civility in online communication. It is, however, not an easy task in online discussion settings where written words are the primary communication channel. In this paper, we derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions through a survey with 123 online forum users and interviews with five debating experts. To satisfy these design goals, we analyzed and built a labeled dataset of fine-grained persuasive strategies (i.e., logos, pathos, ethos, and evidence) in 164 arguments with high ratings on persuasiveness from ChangeMyView, a popular online discussion forum. We then designed an interactive visual system, Persua, which provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments. In particular, the system constructs portfolios of arguments based on different persuasive strategies applied to a given discussion topic. It then presents concrete examples based on the difference between the portfolios of user input and high-quality arguments in the dataset. A between-subjects study shows suggestive evidence that Persua encourages users to submit more times for feedback and helps users improve more on the persuasiveness of their arguments than a baseline system. Finally, a set of design considerations was summarized to guide future intelligent systems that improve the persuasiveness in text.
    Learning to Hash Naturally Sorts. (arXiv:2201.13322v2 [cs.CV] UPDATED)
    Learning to hash pictures a list-wise sorting problem. Its testing metrics, e.g., mean-average precision, count on a sorted candidate list ordered by pair-wise code similarity. However, scarcely does one train a deep hashing model with the sorted results end-to-end because of the non-differentiable nature of the sorting operation. This inconsistency in the objectives of training and test may lead to sub-optimal performance since the training loss often fails to reflect the actual retrieval metric. In this paper, we tackle this problem by introducing Naturally-Sorted Hashing (NSH). We sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training. Thanks to the recent advances in differentiable sorting approximations, the hash head receives gradients from the sorter so that the hash encoder can be optimized along with the training procedure. Additionally, we describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning, which allows NSH to mine data semantic relations during training in an unsupervised manner. Our extensive experiments show the proposed NSH model significantly outperforms the existing unsupervised hashing methods on three benchmarked datasets.
    Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets. (arXiv:2109.13514v2 [cs.CV] UPDATED)
    Shapelet-based algorithms are widely used for time series classification because of their ease of interpretation, but they are currently outperformed by recent state-of-the-art approaches. We present a new formulation of time series shapelets including the notion of dilation, and we introduce a new shapelet feature to enhance their discriminative power for classification. Experiments performed on 112 datasets show that our method improves on the state-of-the-art shapelet algorithm, and achieves comparable accuracy to recent state-of-the-art approaches, without sacrificing neither scalability, nor interpretability.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Deep learning techniques for energy clustering in the CMS ECAL. (arXiv:2204.10277v1 [hep-ex])
    The reconstruction of electrons and photons in CMS depends on topological clustering of the energy deposited by an incident particle in different crystals of the electromagnetic calorimeter (ECAL). These clusters are formed by aggregating neighbouring crystals according to the expected topology of an electromagnetic shower in the ECAL. The presence of upstream material (beampipe, tracker and support structures) causes electrons and photons to start showering before reaching the calorimeter. This effect, combined with the 3.8T CMS magnetic field, leads to energy being spread in several clusters around the primary one. It is essential to recover the energy contained in these satellite clusters in order to achieve the best possible energy resolution for physics analyses. Historically satellite clusters have been associated to the primary cluster using a purely topological algorithm which does not attempt to remove spurious energy deposits from additional pileup interactions (PU). The performance of this algorithm is expected to degrade during LHC Run 3 (2022+) because of the larger average PU levels and the increasing levels of noise due to the ageing of the ECAL detector. New methods are being investigated that exploit state-of-the-art deep learning architectures like Graph Neural Networks (GNN) and self-attention algorithms. These more sophisticated models improve the energy collection and are more resilient to PU and noise, helping to preserve the electron and photon energy resolution achieved during LHC Runs 1 and 2. This work will cover the challenges of training the models as well the opportunity that this new approach offers to unify the ECAL energy measurement with the particle identification steps used in the global CMS photon and electron reconstruction.
    Condition Monitoring of Transformer Bushings Using Computational Intelligence. (arXiv:2204.10193v1 [cs.LG])
    Dissolved Gas-in-oil analysis (DGA) is used to monitor the condition of bushings on large power transformers. There are different techniques used in determining the conditions from the data collected, but in this work the Artificial Intelligence techniques are investigated. This work investigates which gases in DGA are related to each other and which ones are important for making decisions. When the related and crucial gases are determined, the other gases are discarded thereby reducing the number of attributes in DGA. Hence a further investigation is done to see how these new datasets influence the performance of the classifiers used to classify the DGA of full attributes. The classifiers used in these experiments were Backpropagation Neural Networks (BPNN) and Support Vector Machines (SVM) whereas the Principal Component Analysis (PCA), Rough Set (RS), Incremental Granular Ranking (GR++) and Decision Trees (DT) were used to reduce the attributes of the dataset. The parameters used when training the BPNN and SVM classifiers are kept fixed to create a controlled test environment when investigating the effects of reducing the number of gases. This work further introduced a new classifier that can handle high dimension dataset and noisy dataset, Rough Neural Network (RNN).
    Geometry-Aware Supertagging with Heterogeneous Dynamic Convolutions. (arXiv:2203.12235v2 [cs.CL] UPDATED)
    The syntactic categories of categorial grammar formalisms are structured units made of smaller, indivisible primitives, bound together by the underlying grammar's category formation rules. In the trending approach of constructive supertagging, neural models are increasingly made aware of the internal category structure, which in turn enables them to more reliably predict rare and out-of-vocabulary categories, with significant implications for grammars previously deemed too complex to find practical use. In this work, we revisit constructive supertagging from a graph-theoretic perspective, and propose a framework based on heterogeneous dynamic graph convolutions aimed at exploiting the distinctive structure of a supertagger's output space. We test our approach on a number of categorial grammar datasets spanning different languages and grammar formalisms, achieving substantial improvements over previous state of the art scores. Code will be made available at https://github.com/konstantinosKokos/dynamic-graph-supertagging
    Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks. (arXiv:2204.09942v1 [cs.CR])
    Industrial control systems (ICSs) are facing increasing cyber-physical attacks that can cause catastrophes in the physical system. Efficient anomaly detection models in the industrial sensor networks are essential for enhancing ICS reliability and security, due to the sensor data is related to the operational state of the ICS. Considering the limited availability of computing resources, this paper proposes a hybrid anomaly detection approach in cloud-edge collaboration industrial sensor networks. The hybrid approach consists of sensor data detection models deployed at the edges and a sensor data analysis model deployed in the cloud. The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load. The sensor data analysis model based on Graph convolutional network, Residual algorithm and Long short-term memory network (GCRL) can effectively extract the spatial and temporal features and then identify the attack precisely. The proposed hybrid anomaly detection approach is evaluated using a benchmark dataset and baseline anomaly detection models. The experimental results show that the proposed approach can achieve an overall 11.19% increase in Recall and an impressive 14.29% improvement in F1-score, compared with the existing models.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.
    A Revealing Large-Scale Evaluation of Unsupervised Anomaly Detection Algorithms. (arXiv:2204.09825v1 [cs.LG])
    Anomaly detection has many applications ranging from bank-fraud detection and cyber-threat detection to equipment maintenance and health monitoring. However, choosing a suitable algorithm for a given application remains a challenging design decision, often informed by the literature on anomaly detection algorithms. We extensively reviewed twelve of the most popular unsupervised anomaly detection methods. We observed that, so far, they have been compared using inconsistent protocols - the choice of the class of interest or the positive class, the split of training and test data, and the choice of hyperparameters - leading to ambiguous evaluations. This observation led us to define a coherent evaluation protocol which we then used to produce an updated and more precise picture of the relative performance of the twelve methods on five widely used tabular datasets. While our evaluation cannot pinpoint a method that outperforms all the others on all datasets, it identifies those that stand out and revise misconceived knowledge about their relative performances.
    Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. (arXiv:2204.09781v1 [cs.DL])
    The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200,000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g., Diagnosis and Treatment) to the articles in LitCovid. Despite the continuing advances in biomedical text mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset, consisting of over 30,000 articles with manually reviewed topics, was created for training and testing. It is one of the largest multilabel classification datasets in biomedical scientific literature. 19 teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181, and 0.9394 for macro F1-score, micro F1-score, and instance-based F1-score, respectively. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development.
    Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. (arXiv:2204.09831v1 [physics.chem-ph])
    We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact Gaussian process regression (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized datasets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over not only other training protocols for MOB-ML, i.e., supervised regression-clustering combined with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best molecular energy predictions compared with the ones from literature on the same benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.
    Memory Bounds for the Experts Problem. (arXiv:2204.09837v1 [cs.DS])
    Online learning with expert advice is a fundamental problem of sequential prediction. In this problem, the algorithm has access to a set of $n$ "experts" who make predictions on each day. The goal on each day is to process these predictions, and make a prediction with the minimum cost. After making a prediction, the algorithm sees the actual outcome on that day, updates its state, and then moves on to the next day. An algorithm is judged by how well it does compared to the best expert in the set. The classical algorithm for this problem is the multiplicative weights algorithm. However, every application, to our knowledge, relies on storing weights for every expert, and uses $\Omega(n)$ memory. There is little work on understanding the memory required to solve the online learning with expert advice problem, or run standard sequential prediction algorithms, in natural streaming models, which is especially important when the number of experts, as well as the number of days on which the experts make predictions, is large. We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds. Our lower bound for i.i.d., random order, and adversarial order streams uses a reduction to a custom-built problem using a novel masking technique, to show a smooth trade-off for regret versus memory. Our upper bounds show novel ways to run standard sequential prediction algorithms in rounds on small "pools" of experts, thus reducing the necessary memory. For random-order streams, we show that our upper bound is tight up to low order terms. We hope that these results and techniques will have broad applications in online learning, and can inspire algorithms based on standard sequential prediction techniques, like multiplicative weights, for a wide range of other problems in the memory-constrained setting.
    FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. (arXiv:2204.09934v1 [eess.AS])
    Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at \url{https://FastDiff.github.io/}.
    Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation. (arXiv:2204.09975v1 [cs.LG])
    Due to the prosperity of Artificial Intelligence (AI) techniques, more and more backdoors are designed by adversaries to attack Deep Neural Networks (DNNs).Although the state-of-the-art method Neural Attention Distillation (NAD) can effectively erase backdoor triggers from DNNs, it still suffers from non-negligible Attack Success Rate (ASR) together with lowered classification ACCuracy (ACC), since NAD focuses on backdoor defense using attention features (i.e., attention maps) of the same order. In this paper, we introduce a novel backdoor defense framework named Attention Relation Graph Distillation (ARGD), which fully explores the correlation among attention features with different orders using our proposed Attention Relation Graphs (ARGs). Based on the alignment of ARGs between both teacher and student models during knowledge distillation, ARGD can eradicate more backdoor triggers than NAD. Comprehensive experimental results show that, against six latest backdoor attacks, ARGD outperforms NAD by up to 94.85% reduction in ASR, while ACC can be improved by up to 3.23%.
    Social Media Sentiment Analysis for Cryptocurrency Market Prediction. (arXiv:2204.10185v1 [cs.CL])
    In this paper, we explore the usability of different natural language processing models for the sentiment analysis of social media applied to financial market prediction, using the cryptocurrency domain as a reference. We study how the different sentiment metrics are correlated with the price movements of Bitcoin. For this purpose, we explore different methods to calculate the sentiment metrics from a text finding most of them not very accurate for this prediction task. We find that one of the models outperforms more than 20 other public ones and makes it possible to fine-tune it efficiently given its interpretable nature. Thus we confirm that interpretable artificial intelligence and natural language processing methods might be more valuable practically than non-explainable and non-interpretable ones. In the end, we analyse potential causal connections between the different sentiment metrics and the price movements.
    FedCL: Federated Contrastive Learning for Privacy-Preserving Recommendation. (arXiv:2204.09850v1 [cs.LG])
    Contrastive learning is widely used for recommendation model learning, where selecting representative and informative negative samples is critical. Existing methods usually focus on centralized data, where abundant and high-quality negative samples are easy to obtain. However, centralized user data storage and exploitation may lead to privacy risks and concerns, while decentralized user data on a single client can be too sparse and biased for accurate contrastive learning. In this paper, we propose a federated contrastive learning method named FedCL for privacy-preserving recommendation, which can exploit high-quality negative samples for effective model training with privacy well protected. We first infer user embeddings from local user data through the local model on each client, and then perturb them with local differential privacy (LDP) before sending them to a central server for hard negative sampling. Since individual user embedding contains heavy noise due to LDP, we propose to cluster user embeddings on the server to mitigate the influence of noise, and the cluster centroids are used to retrieve hard negative samples from the item pool. These hard negative samples are delivered to user clients and mixed with the observed negative samples from local data as well as in-batch negatives constructed from positive samples for federated model training. Extensive experiments on four benchmark datasets show FedCL can empower various recommendation methods in a privacy-preserving way.
    Adversarial Contrastive Learning by Permuting Cluster Assignments. (arXiv:2204.10314v1 [cs.LG])
    Contrastive learning has gained popularity as an effective self-supervised representation learning technique. Several research directions improve traditional contrastive approaches, e.g., prototypical contrastive methods better capture the semantic similarity among instances and reduce the computational burden by considering cluster prototypes or cluster assignments, while adversarial instance-wise contrastive methods improve robustness against a variety of attacks. To the best of our knowledge, no prior work jointly considers robustness, cluster-wise semantic similarity and computational efficiency. In this work, we propose SwARo, an adversarial contrastive framework that incorporates cluster assignment permutations to generate representative adversarial samples. We evaluate SwARo on multiple benchmark datasets and against various white-box and black-box attacks, obtaining consistent improvements over state-of-the-art baselines.
    TND-NAS: Towards Non-Differentiable Objectives in Differentiable Neural Architecture Search. (arXiv:2111.03892v2 [cs.LG] UPDATED)
    Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its high efficiency compared with the early NAS (EA-based, RL-based) methods. Recent differentiable NAS also aims at further improving the search performance and reducing the GPU-memory consumption. However, these methods are no longer naturally capable of tackling the non-differentiable objectives, e.g., energy, resource-constrained efficiency, and other metrics, let alone the multi-objective search demands. Researches in the multi-objective NAS field target this but requires vast computational resources cause of the sole optimization of each candidate architecture. In light of this discrepancy, we propose the TND-NAS, which is with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS. Under the differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters ($\alpha$) been optimized in discrete space, while resorting to the progressive search space shrinking by $\alpha$. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3\%, 2.4M/2.95\%, 9.57M/2.54\%) and CIFAR100 (2.46M/18.3\%, 5.46/16.73\%, 12.88/15.20\%) datasets. Favorably, compared with other multi-objective NAS methods, TND-NAS is less time-consuming (1.3 GPU-days on NVIDIA 1080Ti, 1/6 of that in NSGA-Net), and can be conveniently adapted to real-world NAS scenarios (resource-constrained, platform-specialized).
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.  ( 2 min )
    Surfer100: Generating Surveys From Web Resources on Wikipedia-style. (arXiv:2112.06377v2 [cs.CL] UPDATED)
    Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.  ( 2 min )
    Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity. (arXiv:2111.05329v3 [cs.CV] UPDATED)
    We present CrissCross, a self-supervised framework for learning audio-visual representations. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard synchronous cross-modal relations, CrissCross also learns asynchronous cross-modal relationships. We show that by relaxing the temporal synchronicity between the audio and visual modalities, the network learns strong generalized representations. Our experiments show that strong augmentations for both audio and visual modalities with relaxation of cross-modal temporal synchronicity optimize performance. To pretrain our proposed framework, we use 3 different datasets with varying sizes, Kinetics-Sound, Kinetics400, and AudioSet. The learned representations are evaluated on a number of downstream tasks namely action recognition, sound classification, and retrieval. CrissCross shows state-of-the-art performances on action recognition (UCF101 and HMDB51) and sound classification (ESC50 and DCASE). The codes and pretrained models will be made publicly available.  ( 2 min )
    NICO++: Towards Better Benchmarking for Domain Generalization. (arXiv:2204.08040v2 [cs.CV] UPDATED)
    Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++ along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.
    Relevance-guided Unsupervised Discovery of Abilities with Quality-Diversity Algorithms. (arXiv:2204.09828v1 [cs.NE])
    Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.
    DeepGate: Learning Neural Representations of Logic Gates. (arXiv:2111.14616v3 [cs.LG] UPDATED)
    Applying deep learning (DL) techniques in the electronic design automation (EDA) field has become a trending topic. Most solutions apply well-developed DL models to solve specific EDA problems. While demonstrating promising results, they require careful model tuning for every problem. The fundamental question on "How to obtain a general and effective neural representation of circuits?" has not been answered yet. In this work, we take the first step towards solving this problem. We propose DeepGate, a novel representation learning solution that effectively embeds both logic function and structural information of a circuit as vectors on each gate. Specifically, we propose transforming circuits into unified and-inverter graph format for learning and using signal probabilities as the supervision task in DeepGate. We then introduce a novel graph neural network that uses strong inductive biases in practical circuits as learning priors for signal probability prediction. Our experimental results show the efficacy and generalization capability of DeepGate.
    BTranspose: Bottleneck Transformers for Human Pose Estimation with Self-Supervised Pre-Training. (arXiv:2204.10209v1 [cs.LG])
    The task of 2D human pose estimation is challenging as the number of keypoints is typically large (~ 17) and this necessitates the use of robust neural network architectures and training pipelines that can capture the relevant features from the input image. These features are then aggregated to make accurate heatmap predictions from which the final keypoints of human body parts can be inferred. Many papers in literature use CNN-based architectures for the backbone, and/or combine it with a transformer, after which the features are aggregated to make the final keypoint predictions [1]. In this paper, we consider the recently proposed Bottleneck Transformers [2], which combine CNN and multi-head self attention (MHSA) layers effectively, and we integrate it with a Transformer encoder and apply it to the task of 2D human pose estimation. We consider different backbone architectures and pre-train them using the DINO self-supervised learning method [3], this pre-training is found to improve the overall prediction accuracy. We call our model BTranspose, and experiments show that on the COCO validation set, our model achieves an AP of 76.4, which is competitive with other methods such as [1] and has fewer network parameters. Furthermore, we also present the dependencies of the final predicted keypoints on both the MHSA block and the Transformer encoder layers, providing clues on the image sub-regions the network attends to at the mid and high levels.
    Understanding the Domain Gap in LiDAR Object Detection Networks. (arXiv:2204.10024v1 [cs.CV])
    In order to make autonomous driving a reality, artificial neural networks have to work reliably in the open-world. However, the open-world is vast and continuously changing, so it is not technically feasible to collect and annotate training datasets which accurately represent this domain. Therefore, there are always domain gaps between training datasets and the open-world which must be understood. In this work, we investigate the domain gaps between high-resolution and low-resolution LiDAR sensors in object detection networks. Using a unique dataset, which enables us to study sensor resolution domain gaps independent of other effects, we show two distinct domain gaps - an inference domain gap and a training domain gap. The inference domain gap is characterised by a strong dependence on the number of LiDAR points per object, while the training gap shows no such dependence. These fndings show that different approaches are required to close these inference and training domain gaps.
    Towards Reliable Neural Generative Modeling of Detectors. (arXiv:2204.09947v1 [physics.ins-det])
    The increasing luminosities of future data taking at Large Hadron Collider and next generation collider experiments require an unprecedented amount of simulated events to be produced. Such large scale productions demand a significant amount of valuable computing resources. This brings a demand to use new approaches to event generation and simulation of detector responses. In this paper, we discuss the application of generative adversarial networks (GANs) to the simulation of the LHCb experiment events. We emphasize main pitfalls in the application of GANs and study the systematic effects in detail. The presented results are based on the Geant4 simulation of the LHCb Cherenkov detector.
    Conditional entropy minimization principle for learning domain invariant representation features. (arXiv:2201.10460v3 [cs.LG] UPDATED)
    Invariance principle-based methods, for example, Invariant Risk Minimization (IRM), have recently emerged as promising approaches for Domain Generalization (DG). Despite the promising theory, invariance principle-based approaches fail in common classification tasks due to the mixture of the true invariant features and the spurious invariant features. In this paper, we propose a framework based on the conditional entropy minimization principle to filter out the spurious invariant features leading to a new algorithm with a better generalization capability. We theoretically prove that under some particular assumptions, the representation function can precisely recover the true invariant features. In addition, we also show that the proposed approach is closely related to the well-known Information Bottleneck (IB) framework. Both the theoretical and numerical results are provided to justify our approach.
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.
    Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation. (arXiv:2204.08647v3 [cs.RO] UPDATED)
    For autonomous quadruped robot navigation in various complex environments, a typical SOTA system is composed of four main modules -- mapper, global planner, local planner, and command-tracking controller -- in a hierarchical manner. In this paper, we build a robust and safe local planner which is designed to generate a velocity plan to track a coarsely planned path from the global planner. Previous works used waypoint-based methods (e.g. Proportional-Differential control and pure pursuit) which simplify the path tracking problem to local point-goal navigation. However, they suffer from frequent collisions in geometrically complex and narrow environments because of two reasons; the global planner uses a coarse and inaccurate model and the local planner is unable to track the global plan sufficiently well. Currently, deep learning methods are an appealing alternative because they can learn safety and path feasibility from experience more accurately. However, existing deep learning methods are not capable of planning for a long horizon. In this work, we propose a learning-based fully autonomous navigation framework composed of three innovative elements: a learned forward dynamics model (FDM), an online sampling-based model-predictive controller, and an informed trajectory sampler (ITS). Using our framework, a quadruped robot can autonomously navigate in various complex environments without a collision and generate a smoother command plan compared to the baseline method. Furthermore, our method can reactively handle unexpected obstacles on the planned path and avoid them. Project page https://awesomericky.github.io/projects/FDM_ITS_navigation/.
    Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images. (arXiv:2204.08454v3 [cs.CV] UPDATED)
    Remote-sensing (RS) Change Detection (CD) aims to detect "changes of interest" from co-registered bi-temporal images. The performance of existing deep supervised CD methods is attributed to the large amounts of annotated data used to train the networks. However, annotating large amounts of remote sensing images is labor-intensive and expensive, particularly with bi-temporal images, as it requires pixel-wise comparisons by a human expert. On the other hand, we often have access to unlimited unlabeled multi-temporal RS imagery thanks to ever-increasing earth observation programs. In this paper, we propose a simple yet effective way to leverage the information from unlabeled bi-temporal images to improve the performance of CD approaches. More specifically, we propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss by constraining the output change probability map of a given unlabeled bi-temporal image pair to be consistent under the small random perturbations applied on the deep feature difference map that is obtained by subtracting their latent feature representations. Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD even with access to as little as 10% of the annotated training data. Code available at https://github.com/wgcban/SemiCD
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.
    Debiased Learning from Naturally Imbalanced Pseudo-Labels. (arXiv:2201.01490v2 [cs.LG] UPDATED)
    Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.
    Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation. (arXiv:2204.10020v1 [eess.AS])
    Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive text-to-speech (TTS) when only neutral data for the target speaker are available. Although the quality of VC is crucial for this approach, it is challenging to learn a stable VC model because the amount of data is limited in low-resource scenarios, and highly expressive speech has large acoustic variety. To address this issue, we propose a novel data augmentation method that combines pitch-shifting and VC techniques. Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1,000 utterances of the target speaker's neutral data are available. Subjective test results showed that a FastSpeech 2-based emotional TTS system with the proposed method improved naturalness and emotional similarity compared with conventional methods.
    One-Step Abductive Multi-Target Learning with Diverse Noisy Samples: An Application to Tumour Segmentation for Breast Cancer. (arXiv:2110.10325v5 [cs.LG] UPDATED)
    Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning in inventing advanced artificial intelligence technologies. One-step abductive multi-target learning (OSAMTL), an approach that only combines machine learning and logical reasoning in a one-step balanced way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions.  ( 2 min )
    Dynamical simulation via quantum machine learning with provable generalization. (arXiv:2204.10269v1 [quant-ph])
    Much attention has been paid to dynamical simulation and quantum machine learning (QML) independently as applications for quantum advantage, while the possibility of using QML to enhance dynamical simulations has not been thoroughly investigated. Here we develop a framework for using QML methods to simulate quantum dynamics on near-term quantum hardware. We use generalization bounds, which bound the error a machine learning model makes on unseen data, to rigorously analyze the training data requirements of an algorithm within this framework. This provides a guarantee that our algorithm is resource-efficient, both in terms of qubit and data requirements. Our numerics exhibit efficient scaling with problem size, and we simulate 20 times longer than Trotterization on IBMQ-Bogota.  ( 2 min )
    Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization. (arXiv:2204.10231v1 [cs.LG])
    Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems. They traditionally assume equal misclassification costs for each class; however, real-world problems may have an uneven class distribution. This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs. EBCS-SVM handles imbalanced classification problems by simultaneously learning the support vectors and optimizing the SVM hyperparameters, which comprise the kernel parameter and misclassification costs. The resulting optimization problem is a bilevel problem, where the lower level determines the support vectors and the upper level the hyperparameters. This optimization problem is solved using an evolutionary algorithm (EA) at the upper level and sequential minimal optimization (SMO) at the lower level. These two methods work in a nested fashion, that is, the optimal support vectors help guide the search of the hyperparameters, and the lower level is initialized based on previous successful solutions. The proposed method is assessed using 70 datasets of imbalanced classification and compared with several state-of-the-art methods. The experimental results, supported by a Bayesian test, provided evidence of the effectiveness of EBCS-SVM when working with highly imbalanced datasets.  ( 2 min )
    DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color Semantic Segmentation. (arXiv:2204.10266v1 [cs.LG])
    In this paper we present a new approach for feature fusion between RGB and LWIR Thermal images for the task of semantic segmentation for driving perception. We propose DooDLeNet, a double DeepLab architecture with specialized encoder-decoders for thermal and color modalities and a shared decoder for final segmentation. We combine two strategies for feature fusion: confidence weighting and correlation weighting. We report state-of-the-art mean IoU results on the MF dataset.  ( 2 min )
    Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications. (arXiv:2202.08916v3 [eess.IV] UPDATED)
    Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research.  ( 2 min )
    Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers. (arXiv:2204.10125v1 [cs.SD])
    Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.
    OCTOPUS -- optical coherence tomography plaque and stent analysis software. (arXiv:2204.10212v1 [eess.IV])
    Compared with other imaging modalities, intravascular optical coherence tomography (IVOCT) has significant advantages for guiding percutaneous coronary interventions. To aid IVOCT research studies, we developed the Optical Coherence TOmography PlaqUe and Stent (OCTOPUS) analysis software. To automate image analysis results, the software includes several important algorithmic steps: pre-processing, deep learning plaque segmentation, machine learning identification of stent struts, and registration of pullbacks. Interactive visualization and manual editing of segmentations were included in the software. Quantifications include stent deployment characteristics (e.g., stent strut malapposition), strut level analysis, calcium angle, and calcium thickness measurements. Interactive visualizations include (x,y) anatomical, en face, and longitudinal views with optional overlays. Underlying plaque segmentation algorithm yielded excellent pixel-wise results (86.2% sensitivity and 0.781 F1 score). Using OCTOPUS on 34 new pullbacks, we determined that following automated segmentation, only 13% and 23% of frames needed any manual touch up for detailed lumen and calcification labeling, respectively. Only up to 3.8% of plaque pixels were modified, leading to an average editing time of only 7.5 seconds/frame, an approximately 80% reduction compared to manual analysis. Regarding stent analysis, sensitivity and precision were both greater than 90%, and each strut was successfully classified as either covered or uncovered with high sensitivity (94%) and specificity (90%). We introduced and evaluated the clinical application of a highly automated software package, OCTOPUS, for quantitative plaque and stent analysis in IVOCT images. The software is currently used as an offline tool for research purposes; however, the software's embedded algorithms may also be useful for real-time treatment planning.  ( 2 min )
    Sketch2PQ: Freeform Planar Quadrilateral Mesh Design via a Single Sketch. (arXiv:2201.09367v3 [cs.GR] UPDATED)
    The freeform architectural modeling process often involves two important stages: concept design and digital modeling. In the first stage, architects usually sketch the overall 3D shape and the panel layout on a physical or digital paper briefly. In the second stage, a digital 3D model is created using the sketch as a reference. The digital model needs to incorporate geometric requirements for its components, such as the planarity of panels due to consideration of construction costs, which can make the modeling process more challenging. In this work, we present a novel sketch-based system to bridge the concept design and digital modeling of freeform roof-like shapes represented as planar quadrilateral (PQ) meshes. Our system allows the user to sketch the surface boundary and contour lines under axonometric projection and supports the sketching of occluded regions. In addition, the user can sketch feature lines to provide directional guidance to the PQ mesh layout. Given the 2D sketch input, we propose a deep neural network to infer in real-time the underlying surface shape along with a dense conjugate direction field, both of which are used to extract the final PQ mesh. To train and validate our network, we generate a large synthetic dataset that mimics architect sketching of freeform quadrilateral patches. The effectiveness and usability of our system are demonstrated with quantitative and qualitative evaluation as well as user studies.  ( 2 min )
    Assessing Machine Learning Algorithms for Near-Real Time Bus Ridership Prediction During Extreme Weather. (arXiv:2204.09792v1 [stat.AP])
    Given an increasingly volatile climate, the relationship between weather and transit ridership has drawn increasing interest. However, challenges stemming from spatio-temporal dependency and non-stationarity have not been fully addressed in modelling and predicting transit ridership under the influence of weather conditions especially with the traditional statistical approaches. Drawing on three-month smart card data in Brisbane, Australia, this research adopts and assesses a suite of machine-learning algorithms, i.e., random forest, eXtreme Gradient Boosting (XGBoost) and Tweedie XGBoost, to model and predict near real-time bus ridership in relation to sudden change of weather conditions. The study confirms that there indeed exists a significant level of spatio-temporal variability of weather-ridership relationship, which produces equally dynamic patterns of prediction errors. Further comparison of model performance suggests that Tweedie XGBoost outperforms the other two machine-learning algorithms in generating overall more accurate prediction outcomes in space and time. Future research may advance the current study by drawing on larger data sets and applying more advanced machine and deep-learning approaches to provide more enhanced evidence for real-time operation of transit systems.  ( 2 min )
    The 2021 NIST Speaker Recognition Evaluation. (arXiv:2204.10242v1 [eess.AS])
    The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of the ongoing evaluation series conducted by the U.S. National Institute of Standards and Technology (NIST) since 1996. It was the second large-scale multimodal speaker/person recognition evaluation organized by NIST (the first one being SRE19). Similar to SRE19, it featured two core evaluation tracks, namely audio and audio-visual, as well as an optional visual track. In addition to offering fixed and open training conditions, it also introduced new challenges for the community, thanks to a new multimodal (i.e., audio, video, and selfie images) and multilingual (i.e., with multilingual speakers) corpus, termed WeCanTalk, collected outside North America by the Linguistic Data Consortium (LDC). These challenges included: 1) trials (target and non-target) with enrollment and test segments originating from different domains (i.e., telephony versus video), and 2) trials (target and non-target) with enrollment and test segments spoken in different languages (i.e., cross-lingual trials). This paper presents an overview of SRE21 including the tasks, performance metric, data, evaluation protocol, results and system performance analyses. A total of 23 organizations (forming 15 teams) from academia and industry participated in SRE21 and submitted 158 valid system outputs. Evaluation results indicate: audio-visual fusion produce substantial gains in performance over audio-only or visual-only systems; top performing speaker and face recognition systems exhibited comparable performance under the matched domain conditions present in this evaluation; and, the use of complex neural network architectures (e.g., ResNet) along with angular losses with margin, data augmentation, as well as long duration fine-tuning contributed to notable performance improvements for the audio-only speaker recognition task.  ( 2 min )
    Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift. (arXiv:2204.08816v3 [astro-ph.GA] UPDATED)
    In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.  ( 2 min )
    Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems. (arXiv:2203.11758v2 [math.OC] UPDATED)
    Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for general continuous space-time stochastic control problems has been elusive. This paper closes the gap by proposing a proximal gradient algorithm for feedback controls of finite-time horizon stochastic control problems. The state dynamics are continuous time nonlinear diffusions with controlled drift and possibly degenerate noise, and the objectives are nonconvex in the state and nonsmooth in the control. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.  ( 2 min )
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.
    Fink: early supernovae Ia classification using active learning. (arXiv:2111.11438v2 [astro-ph.IM] UPDATED)
    We describe how the Fink broker early supernova Ia classifier optimizes its ML classifications by employing an active learning (AL) strategy. We demonstrate the feasibility of implementation of such strategies in the current Zwicky Transient Facility (ZTF) public alert data stream. We compare the performance of two AL strategies: uncertainty sampling and random sampling. Our pipeline consists of 3 stages: feature extraction, classification and learning strategy. Starting from an initial sample of 10 alerts (5 SN Ia and 5 non-Ia), we let the algorithm identify which alert should be added to the training sample. The system is allowed to evolve through 300 iterations. Our data set consists of 23 840 alerts from the ZTF with confirmed classification via cross-match with SIMBAD database and the Transient name server (TNS), 1 600 of which were SNe Ia (1 021 unique objects). The data configuration, after the learning cycle was completed, consists of 310 alerts for training and 23 530 for testing. Averaging over 100 realizations, the classifier achieved 89% purity and 54% efficiency. From 01/November/2020 to 31/October/2021 Fink has applied its early supernova Ia module to the ZTF stream and communicated promising SN Ia candidates to the TNS. From the 535 spectroscopically classified Fink candidates, 459 (86%) were proven to be SNe Ia. Our results confirm the effectiveness of active learning strategies for guiding the construction of optimal training samples for astronomical classifiers. It demonstrates in real data that the performance of learning algorithms can be highly improved without the need of extra computational resources or overwhelmingly large training samples. This is, to our knowledge, the first application of AL to real alerts data.
    Path sampling of recurrent neural networks by incorporating known physics. (arXiv:2203.00597v2 [cond-mat.dis-nn] UPDATED)
    Recurrent neural networks have seen widespread use in modeling dynamical systems in varied domains such as weather prediction, text prediction and several others. Often one wishes to supplement the experimentally observed dynamics with prior knowledge or intuition about the system. While the recurrent nature of these networks allows them to model arbitrarily long memories in the time series used in training, it makes it harder to impose prior knowledge or intuition through generic constraints. In this work, we present a path sampling approach based on principle of Maximum Caliber that allows us to include generic thermodynamic or kinetic constraints into recurrent neural networks. We show the method here for a widely used type of recurrent neural network known as long short-term memory network in the context of supplementing time series collected from different application domains. These include classical Molecular Dynamics of a protein and Monte Carlo simulations of an open quantum system continuously losing photons to the environment and displaying Rabi oscillations. Our method can be easily generalized to other generative artificial intelligence models and to generic time series in different areas of physical and social sciences, where one wishes to supplement limited data with intuition or theory based corrections.
    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
    An Improved Transfer Model: Randomized Transferable Machine. (arXiv:2011.13629v2 [cs.LG] UPDATED)
    Feature-based transfer is one of the most effective methodologies for transfer learning. Existing studies usually assume that the learned new feature representation is \emph{domain-invariant}, and thus train a transfer model $\mathcal{M}$ on the source domain. In this paper, we consider a more realistic scenario where the new feature representation is suboptimal and small divergence still exists across domains. We propose a new transfer model called Randomized Transferable Machine (RTM) to handle such small divergence of domains. Specifically, we work on the new source and target data learned from existing feature-based transfer methods. The key idea is to enlarge source training data populations by randomly corrupting the new source data using some noises, and then train a transfer model $\widetilde{\mathcal{M}}$ that performs well on all the corrupted source data populations. In principle, the more corruptions are made, the higher the probability of the new target data can be covered by the constructed source data populations, and thus better transfer performance can be achieved by $\widetilde{\mathcal{M}}$. An ideal case is with infinite corruptions, which however is infeasible in reality. We develop a marginalized solution that enables to train an $\widetilde{\mathcal{M}}$ without conducting any corruption but equivalent to be trained using infinite source noisy data populations. We further propose two instantiations of $\widetilde{\mathcal{M}}$, which theoretically show the transfer superiority over the conventional transfer model $\mathcal{M}$. More importantly, both instantiations have closed-form solutions, leading to a fast and efficient training process. Experiments on various real-world transfer tasks show that RTM is a promising transfer model.
    Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective. (arXiv:2103.03113v3 [cs.LG] UPDATED)
    Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. Nevertheless, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: Given a sufficiently expressive space of models, can we successfully find a good solution via gradient descent-based optimizers? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.
    Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface. (arXiv:2107.06393v2 [cs.CV] UPDATED)
    Modeling complex phenomena typically involves the use of both discrete and continuous variables. Such a setting applies across a wide range of problems, from identifying trends in time-series data to performing effective compositional scene understanding in images. Here, we propose Hybrid Memoised Wake-Sleep (HMWS), an algorithm for effective inference in such hybrid discrete-continuous models. Prior approaches to learning suffer as they need to perform repeated expensive inner-loop discrete inference. We build on a recent approach, Memoised Wake-Sleep (MWS), which alleviates part of the problem by memoising discrete variables, and extend it to allow for a principled and effective way to handle continuous variables by learning a separate recognition model used for importance-sampling based approximate inference and marginalization. We evaluate HMWS in the GP-kernel learning and 3D scene understanding domains, and show that it outperforms current state-of-the-art inference methods.
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Anti-Jamming Games in Multi-Band Wireless Ad Hoc Networks. (arXiv:2111.11178v2 [cs.IT] UPDATED)
    For multi-band wireless ad hoc networks of multiple users, an anti-jamming game between the users and a jammer is studied. In this game, the users (resp. jammer) want to maximize (resp. minimize) the expected rewards of the users taking into account various factors such as communication rate, hopping cost, and jamming loss. We analyze the arms race of the game and derive an optimal frequency hopping policy at each stage of the arms race based on the Markov decision process (MDP). It is analytically shown that the arms race reaches an equilibrium after a few rounds, and a frequency hopping policy and a jamming strategy at the equilibrium are characterized. We propose two kinds of collision avoidance protocols to ensure that at most one user communicates in each frequency band, and provide various numerical results that show the effects of the reward parameters and collision avoidance protocols on the optimal frequency hopping policy and the expected rewards at the equilibrium. Moreover, we discuss about equilibria for the case where the jammer adopts some unpredictable jamming strategies.
    Distributed Learning for Vehicular Dynamic Spectrum Access in Autonomous Driving. (arXiv:2204.10179v1 [cs.NI])
    Reliable wireless communication between the autonomously driving cars is one of the fundamental needs for guaranteeing passenger safety and comfort. However, when the number of communicating cars increases, the transmission quality may be significantly degraded due to too high occupancy radio of the used frequency band. In this paper, we concentrate on the autonomous vehicle-platooning use-case, where intra-platoon communication is done in the dynamically selected frequency band, other than nominally devoted for such purposes. The carrier selection is done in a flexible manner with the support of the context database located at the roadside unit (edge of wireless communication infrastructure). However, as the database delivers only context information to the platoons' leaders, the final decision is made separately by the individual platoons, following the suggestions made by the artificial intelligence algorithms. In this work, we concentrate on a lightweight Q-learning solution, that could be successfully implemented in each car for dynamic channel selection.
    Merging of neural networks. (arXiv:2204.09973v1 [cs.LG])
    We propose a simple scheme for merging two neural networks trained with different starting initialization into a single one with the same size as the original ones. We do this by carefully selecting channels from each input network. Our procedure might be used as a finalization step after one tries multiple starting seeds to avoid an unlucky one. We also show that training two networks and merging them leads to better performance than training a single network for an extended period of time. Availability: https://github.com/fmfi-compbio/neural-network-merging
    OUR-GAN: One-shot Ultra-high-Resolution Generative Adversarial Networks. (arXiv:2202.13799v2 [cs.CV] UPDATED)
    We propose OUR-GAN, the first one-shot ultra-high-resolution (UHR) image synthesis framework that generates non-repetitive images with 4K or higher resolution from a single training image. OUR-GAN generates a visually coherent image at low resolution and then gradually increases the resolution by super-resolution. Since OUR-GAN learns from a real UHR image, it can synthesize large-scale shapes with fine details while maintaining long-range coherence, which is difficult with conventional generative models that generate large images based on the patch distribution learned from relatively small images. OUR-GAN applies seamless subregion-wise super-resolution that synthesizes 4k or higher UHR images with limited memory, preventing discontinuity at the boundary. Additionally, OUR-GAN improves visual coherence maintaining diversity by adding vertical positional embeddings to the feature maps. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with existing methods. The synthesized images are presented at https://anonymous-62348.github.io.
    A System for Interactive Examination of Learned Security Policies. (arXiv:2204.01126v2 [cs.CR] UPDATED)
    We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure's state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.
    INSPIRE: Distributed Bayesian Optimization for ImproviNg SPatIal REuse in Dense WLANs. (arXiv:2204.10184v1 [cs.NI])
    WLANs, which have overtaken wired networks to become the primary means of connecting devices to the Internet, are prone to performance issues due to the scarcity of space in the radio spectrum. As a response, IEEE 802.11ax and subsequent amendments aim at increasing the spatial reuse of a radio channel by allowing the dynamic update of two key parameters in wireless transmission: the transmission power (TX_POWER) and the sensitivity threshold (OBSS_PD). In this paper, we present INSPIRE, a distributed solution performing local Bayesian optimizations based on Gaussian processes to improve the spatial reuse in WLANs. INSPIRE makes no explicit assumptions about the topology of WLANs and favors altruistic behaviors of the access points, leading them to find adequate configurations of their TX_POWER and OBSS_PD parameters for the "greater good" of the WLANs. We demonstrate the superiority of INSPIRE over other state-of-the-art strategies using the ns-3 simulator and two examples inspired by real-life deployments of dense WLANs. Our results show that, in only a few seconds, INSPIRE is able to drastically increase the quality of service of operational WLANs by improving their fairness and throughput.
    BABD: A Bitcoin Address Behavior Dataset for Address Behavior Pattern Analysis. (arXiv:2204.05746v2 [cs.CR] UPDATED)
    Cryptocurrencies are no longer just the preferred option for cybercriminal activities on darknets, due to the increasing adoption in mainstream applications. This is partly due to the transparency associated with the underpinning ledgers, where any individual can access the record of a transaction record on the public ledger. In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data. We then use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost. The results show that the accuracy rates of these machine learning models on our proposed dataset are between 93.24% and 96.71%. We also analyze the proposed features and their relationships from the experiments, and propose a k-hop subgraph generation algorithm to extract a k-hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node (e.g., a known transaction associated with a criminal investigation).
    How Well Do Sparse Imagenet Models Transfer?. (arXiv:2111.13445v5 [cs.CV] UPDATED)
    Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets. Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned - that is, compressed by sparsifying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, re-growth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods.
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Energy-Efficient Parking Analytics System using Deep Reinforcement Learning. (arXiv:2202.08973v2 [cs.CV] UPDATED)
    Advances in deep vision techniques and ubiquity of smart cameras will drive the next generation of video analytics. However, video analytics applications consume vast amounts of energy as both deep learning techniques and cameras are power-hungry. In this paper, we focus on a parking video analytics platform and propose RL-CamSleep, a deep reinforcement learning-based technique, to actuate the cameras to reduce the energy footprint while retaining the system's utility. Our key insight is that many video-analytics applications do not always need to be operational, and we can design policies to activate video analytics only when necessary. Moreover, our work is complementary to existing work that focuses on improving hardware and software efficiency. We evaluate our approach on a city-scale parking dataset having 76 streets spread across the city. Our analysis demonstrates how streets have various parking patterns, highlighting the importance of an adaptive policy. Our approach can learn such an adaptive policy that can reduce the average energy consumption by 76.38% and achieve an average accuracy of more than 98% in performing video analytics.
    The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. (arXiv:2110.07732v3 [cs.LG] UPDATED)
    Despite progress across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depths. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.
    On the Certified Robustness for Ensemble Models and Beyond. (arXiv:2107.10873v2 [cs.LG] UPDATED)
    Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs by adding perturbations with small magnitude. To defend against such attacks, both empirical and theoretical defense approaches have been extensively studied for a single ML model. In this work, we aim to analyze and provide the certified robustness for ensemble ML models, together with the sufficient and necessary conditions of robustness for different ensemble protocols. Although ensemble models are shown more robust than a single model empirically; surprisingly, we find that in terms of the certified robustness the standard ensemble models only achieve marginal improvement compared to a single model. Thus, to explore the conditions that guarantee to provide certifiably robust ensemble ML models, we first prove that diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the model-smoothness assumption. We then provide the bounded model-smoothness analysis based on the proposed Ensemble-before-Smoothing strategy. We also prove that an ensemble model can always achieve higher certified robustness than a single base model under mild conditions. Inspired by the theoretical findings, we propose the lightweight Diversity Regularized Training (DRT) to train certifiably robust ensemble ML models. Extensive experiments show that our DRT enhanced ensembles can consistently achieve higher certified robustness than existing single and ensemble ML models, demonstrating the state-of-the-art certified L2-robustness on MNIST, CIFAR-10, and ImageNet datasets.  ( 2 min )
    A Survey and Perspective on Artificial Intelligence for Security-Aware Electronic Design Automation. (arXiv:2204.09579v2 [cs.LG] UPDATED)
    Artificial intelligence (AI) and machine learning (ML) techniques have been increasingly used in several fields to improve performance and the level of automation. In recent years, this use has exponentially increased due to the advancement of high-performance computing and the ever increasing size of data. One of such fields is that of hardware design; specifically the design of digital and analog integrated circuits~(ICs), where AI/ ML techniques have been extensively used to address ever-increasing design complexity, aggressive time-to-market, and the growing number of ubiquitous interconnected devices (IoT). However, the security concerns and issues related to IC design have been highly overlooked. In this paper, we summarize the state-of-the-art in AL/ML for circuit design/optimization, security and engineering challenges, research in security-aware CAD/EDA, and future research directions and needs for using AI/ML for security-aware circuit design.  ( 2 min )
    ESS: Learning Event-based Semantic Segmentation from Still Images. (arXiv:2203.10016v1 [cs.CV] CROSS LISTED)
    Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the novelty of the sensor, and the lack of high-quality, labeled datasets. In this work, we introduce ESS, which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, to spur further research in event-based semantic segmentation, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.  ( 2 min )
    Modeling and Predicting Popularity Dynamics via Deep Learning Attention Mechanism. (arXiv:1811.02117v2 [cs.SI] UPDATED)
    An ability to predict the popularity dynamics of individual items within a complex evolving system has important implications in a wide range of domains. Here we propose a deep learning attention mechanism to model the process through which individual items gain their popularity. We analyze the interpretability of the model with the four key phenomena confirmed independently in the previous studies of long-term popularity dynamics quantification, including the intrinsic quality, the aging effect, the recency effect and the Matthew effect. We analyze the effectiveness of introducing attention model in popularity dynamics prediction. Extensive experiments on a real-large citation data set demonstrate that the designed deep learning attention mechanism possesses remarkable power at predicting the long-term popularity dynamics. It consistently outperforms the existing methods, and achieves a significant performance improvement.  ( 2 min )
    Deep Bayesian Active Learning, A Brief Survey on Recent Advances. (arXiv:2012.08044v2 [cs.LG] UPDATED)
    Active learning frameworks offer efficient data annotation without remarkable accuracy degradation. In other words, active learning starts training the model with a small size of labeled data while exploring the space of unlabeled data in order to select most informative samples to be labeled. Generally speaking, representing the uncertainty is crucial in any active learning framework, however, deep learning methods are not capable of either representing or manipulating model uncertainty. On the other hand, from the real world application perspective, uncertainty representation is getting more and more attention in the machine learning community. Deep Bayesian active learning frameworks and generally any Bayesian active learning settings, provide practical consideration in the model which allows training with small data while representing the model uncertainty for further efficient training. In this paper, we briefly survey recent advances in Bayesian active learning and in particular deep Bayesian active learning frameworks.
    Addressing Tactic Volatility in Self-Adaptive Systems Using Evolved Recurrent Neural Networks and Uncertainty Reduction Tactics. (arXiv:2204.10308v1 [cs.LG])
    Self-adaptive systems frequently use tactics to perform adaptations. Tactic examples include the implementation of additional security measures when an intrusion is detected, or activating a cooling mechanism when temperature thresholds are surpassed. Tactic volatility occurs in real-world systems and is defined as variable behavior in the attributes of a tactic, such as its latency or cost. A system's inability to effectively account for tactic volatility adversely impacts its efficiency and resiliency against the dynamics of real-world environments. To enable systems' efficiency against tactic volatility, we propose a Tactic Volatility Aware (TVA-E) process utilizing evolved Recurrent Neural Networks (eRNN) to provide accurate tactic predictions. TVA-E is also the first known process to take advantage of uncertainty reduction tactics to provide additional information to the decision-making process and reduce uncertainty. TVA-E easily integrates into popular adaptation processes enabling it to immediately benefit a large number of existing self-adaptive systems. Simulations using 52,106 tactic records demonstrate that: I) eRNN is an effective prediction mechanism, II) TVA-E represents an improvement over existing state-of-the-art processes in accounting for tactic volatility, and III) Uncertainty reduction tactics are beneficial in accounting for tactic volatility. The developed dataset and tool can be found at https://tacticvolatility.github.io/
    Lessons on Parameter Sharing across Layers in Transformers. (arXiv:2104.06022v3 [cs.CL] UPDATED)
    We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.  ( 2 min )
    DropMessage: Unifying Random Dropping for Graph Neural Networks. (arXiv:2204.10037v1 [cs.LG])
    Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also faces some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate noises into models by randomly masking parts of the input. However, some open-ended problems of random dropping on GNNs remain to solve. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, random noises introduced to GNNs cause the incomplete coverage of parameters and unstable training process. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the message matrix and can be applied to any message-passing GNNs. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, which makes it a theoretical upper bound of other methods. Also, we unify existing random dropping methods into our framework and analyze their effects on GNNs. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has both advantages of effectiveness and generalization.  ( 2 min )
    STONet: A Neural-Operator-Driven Spatio-temporal Network. (arXiv:2204.08414v2 [cs.LG] UPDATED)
    Graph-based spatio-temporal neural networks are effective to model the spatial dependency among discrete points sampled irregularly from unstructured grids, thanks to the great expressiveness of graph neural networks. However, these models are usually spatially-transductive -- only fitting the signals for discrete spatial nodes fed in models but unable to generalize to `unseen' spatial points with zero-shot. In comparison, for forecasting tasks on continuous space such as temperature prediction on the earth's surface, the \textit{spatially-inductive} property allows the model to generalize to any point in the spatial domain, demonstrating models' ability to learn the underlying mechanisms or physics laws of the systems, rather than simply fit the signals. Besides, in temporal domains, \textit{irregularly-sampled} time series, e.g. data with missing values, urge models to be temporally-continuous. Motivated by the two issues, we propose a spatio-temporal framework based on neural operators for PDEs, which learn the underlying mechanisms governing the dynamics of spatially-continuous physical quantities. Experiments show our model's improved performance on forecasting spatially-continuous physic quantities, and its superior generalization to unseen spatial points and ability to handle temporally-irregular data.  ( 2 min )
    MRAM-based Analog Sigmoid Function for In-memory Computing. (arXiv:2204.09918v1 [cs.ET])
    We propose an analog implementation of the transcendental activation function leveraging two spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices and a CMOS inverter. The proposed analog neuron circuit consumes 1.8-27x less power, and occupies 2.5-4931x smaller area, compared to the state-of-the-art analog and digital implementations. Moreover, the developed neuron can be readily integrated with memristive crossbars without requiring any intermediate signal conversion units. The architecture-level analyses show that a fully-analog in-memory computing (IMC) circuit that use our SOT-MRAM neuron along with an SOT-MRAM based crossbar can achieve more than 1.1x, 12x, and 13.3x reduction in power, latency, and energy, respectively, compared to a mixed-signal implementation with analog memristive crossbars and digital neurons. Finally, through cross-layer analyses, we provide a guide on how varying the device-level parameters in our neuron can affect the accuracy of multilayer perceptron (MLP) for MNIST classification.  ( 2 min )
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.  ( 2 min )
    NetSentry: A Deep Learning Approach to Detecting Incipient Large-scale Network Attacks. (arXiv:2202.09873v2 [cs.CR] UPDATED)
    Machine Learning (ML) techniques are increasingly adopted to tackle ever-evolving high-profile network attacks, including DDoS, botnet, and ransomware, due to their unique ability to extract complex patterns hidden in data streams. These approaches are however routinely validated with data collected in the same environment, and their performance degrades when deployed in different network topologies and/or applied on previously unseen traffic, as we uncover. This suggests malicious/benign behaviors are largely learned superficially and ML-based Network Intrusion Detection System (NIDS) need revisiting, to be effective in practice. In this paper we dive into the mechanics of large-scale network attacks, with a view to understanding how to use ML for Network Intrusion Detection (NID) in a principled way. We reveal that, although cyberattacks vary significantly in terms of payloads, vectors and targets, their early stages, which are critical to successful attack outcomes, share many similarities and exhibit important temporal correlations. Therefore, we treat NID as a time-sensitive task and propose NetSentry, perhaps the first of its kind NIDS that builds on Bidirectional Asymmetric LSTM (Bi-ALSTM), an original ensemble of sequential neural models, to detect network threats before they spread. We cross-evaluate NetSentry using two practical datasets, training on one and testing on the other, and demonstrate F1 score gains above 33% over the state-of-the-art, as well as up to 3 times higher rates of detecting attacks such as XSS and web bruteforce. Further, we put forward a novel data augmentation technique that boosts the generalization abilities of a broad range of supervised deep learning algorithms, leading to average F1 score gains above 35%.  ( 2 min )
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.  ( 2 min )
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.  ( 2 min )
    CNLL: A Semi-supervised Approach For Continual Noisy Label Learning. (arXiv:2204.09881v1 [cs.CV])
    The task of continual learning requires careful design of algorithms that can tackle catastrophic forgetting. However, the noisy label, which is inevitable in a real-world scenario, seems to exacerbate the situation. While very few studies have addressed the issue of continual learning under noisy labels, long training time and complicated training schemes limit their applications in most cases. In contrast, we propose a simple purification technique to effectively cleanse the online data stream that is both cost-effective and more accurate. After purification, we perform fine-tuning in a semi-supervised fashion that ensures the participation of all available samples. Training in this fashion helps us learn a better representation that results in state-of-the-art (SOTA) performance. Through extensive experimentation on 3 benchmark datasets, MNIST, CIFAR10 and CIFAR100, we show the effectiveness of our proposed approach. We achieve a 24.8% performance gain for CIFAR10 with 20% noise over previous SOTA methods. Our code is publicly available.  ( 2 min )
    Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector. (arXiv:2104.08044v11 [cs.CR] UPDATED)
    Email threat is a serious issue for enterprise security, which consists of various malicious scenarios, such as phishing, fraud, blackmail and malvertisement. Traditional anti-spam gateway commonly requires to maintain a greylist to filter out unexpected emails based on suspicious vocabularies existed in the mail subject and content. However, the signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various hot topics at present, such as COVID-19 and US election. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each event log of email to a sentence through word embedding then extract interesting items among them by novelty detection. Based on our observations, we claim that, in an enterprise environment, there is a stable relation between senders and receivers, but suspicious emails are commonly from unusual sources, which can be detected through the rareness selection. We evaluate the performance of Holmes in a real-world enterprise environment, in which it sends and receives around 5,000 emails each day. As a result, Holmes can achieve a high detection rate (output around 200 suspicious emails per day) and maintain a low false alarm rate for anomaly detection.  ( 3 min )
    Neural Topic Modeling of Psychotherapy Sessions. (arXiv:2204.10189v1 [cs.CL])
    In this work, we compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings. We also incorporate temporal modeling to put this additional interpretability to action by parsing out topic similarities as a time series in a turn-level resolution. We believe this topic modeling framework can offer interpretable insights for the therapist to optimally decide his or her strategy and improve the psychotherapy effectiveness.  ( 2 min )
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.  ( 2 min )
    Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports. (arXiv:2009.05094v5 [cs.LG] UPDATED)
    Safe deployment of deep learning systems in critical real world applications requires models to make very few mistakes, and only under predictable circumstances. In this work, we address this problem using an abstaining classifier that is tuned to have $>$95% accuracy, and then identify the determinants of abstention using LIME. Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate an abstaining classifier in a multitask setting for classifying cancer pathology reports from the NCI SEER cancer registries on six tasks of interest. For these tasks, we reduce the classification error rate by factors of 2--5 by abstaining on 25--45% of the reports. For the specific task of classifying cancer site, we are able to identify metastasis, reports involving lymph nodes, and discussion of multiple cancer sites as responsible for many of the classification mistakes, and observe that the extent and types of mistakes vary systematically with cancer site (e.g., breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks\edit, and greater than 85% for all six tasks on the retained samples. Furthermore, we show that LIME provides a better determinant of classification than measures of word occurrence alone. By combining a deep abstaining classifier with feature identification using LIME, we are able to identify concepts responsible for both correctness and abstention when classifying cancer sites from pathology reports. The improvement of LIME over keyword searches is statistically significant, presumably because words are assessed in context and have been identified as a local determinant of classification.  ( 3 min )
    Learnable Model Augmentation Self-Supervised Learning for Sequential Recommendation. (arXiv:2204.10128v1 [cs.IR])
    Sequential Recommendation aims to predict the next item based on user behaviour. Recently, Self-Supervised Learning (SSL) has been proposed to improve recommendation performance. However, most of existing SSL methods use a uniform data augmentation scheme, which loses the sequence correlation of an original sequence. To this end, in this paper, we propose a Learnable Model Augmentation self-supervised learning for sequential Recommendation (LMA4Rec). Specifically, LMA4Rec first takes model augmentation as a supplementary method for data augmentation to generate views. Then, LMA4Rec uses learnable Bernoulli dropout to implement model augmentation learnable operations. Next, self-supervised learning is used between the contrastive views to extract self-supervised signals from an original sequence. Finally, experiments on three public datasets show that the LMA4Rec method effectively improves sequential recommendation performance compared with baseline methods.  ( 2 min )
    TorchSparse: Efficient Point Cloud Inference Engine. (arXiv:2204.10319v1 [cs.LG])
    Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7x. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.  ( 2 min )
    Learning Future Object Prediction with a Spatiotemporal Detection Transformer. (arXiv:2204.10321v1 [cs.CV])
    We explore future object prediction -- a challenging problem where all objects visible in a future video frame are to be predicted. We propose to tackle this problem end-to-end by training a detection transformer to directly output future objects. In order to make accurate predictions about the future, it is necessary to capture the dynamics in the scene, both of other objects and of the ego-camera. We extend existing detection transformers in two ways to capture the scene dynamics. First, we experiment with three different mechanisms that enable the model to spatiotemporally process multiple frames. Second, we feed ego-motion information to the model via cross-attention. We show that both of these cues substantially improve future object prediction performance. Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons, and outperform baselines for longer prediction horizons.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware. (arXiv:2204.10183v1 [cs.LG])
    The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-to-end multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision.  ( 2 min )
    A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms. (arXiv:2204.10233v1 [cs.LG])
    Motivated by the growing importance of reducing unfairness in ML predictions, Fair-ML researchers have presented an extensive suite of algorithmic "fairness-enhancing" remedies. Most existing algorithms, however, are agnostic to the sources of the observed unfairness. As a result, the literature currently lacks guiding frameworks to specify conditions under which each algorithmic intervention can potentially alleviate the underpinning cause of unfairness. To close this gap, we scrutinize the underlying biases (e.g., in the training data or design choices) that cause observational unfairness. We present a bias-injection sandbox tool to investigate fairness consequences of various biases and assess the effectiveness of algorithmic remedies in the presence of specific types of bias. We call this process the bias(stress)-testing of algorithmic interventions. Unlike existing toolkits, ours provides a controlled environment to counterfactually inject biases in the ML pipeline. This stylized setup offers the distinct capability of testing fairness interventions beyond observational data and against an unbiased benchmark. In particular, we can test whether a given remedy can alleviate the injected bias by comparing the predictions resulting after the intervention in the biased setting with true labels in the unbiased regime -- that is, before any bias injection. We illustrate the utility of our toolkit via a proof-of-concept case study on synthetic data. Our empirical analysis showcases the type of insights that can be obtained through our simulations.  ( 2 min )
    Feature anomaly detection system (FADS) for intelligent manufacturing. (arXiv:2204.10318v1 [cs.CV])
    Anomaly detection is important for industrial automation and part quality assurance, and while humans can easily detect anomalies in components given a few examples, designing a generic automated system that can perform at human or above human capabilities remains a challenge. In this work, we present a simple new anomaly detection algorithm called FADS (feature-based anomaly detection system) which leverages pretrained convolutional neural networks (CNN) to generate a statistical model of nominal inputs by observing the activation of the convolutional filters. During inference the system compares the convolutional filter activation of the new input to the statistical model and flags activations that are outside the expected range of values and therefore likely an anomaly. By using a pretrained network, FADS demonstrates excellent performance similar to or better than other machine learning approaches to anomaly detection while at the same time FADS requires no tuning of the CNN weights. We demonstrate FADS ability by detecting process parameter changes on a custom dataset of additively manufactured lattices. The FADS localization algorithm shows that textural differences that are visible on the surface can be used to detect process parameter changes. In addition, we test FADS on benchmark datasets, such as the MVTec Anomaly Detection dataset, and report good results.  ( 2 min )
    A two-level machine learning framework for predictive maintenance: comparison of learning formulations. (arXiv:2204.10083v1 [cs.LG])
    Predicting incoming failures and scheduling maintenance based on sensors information in industrial machines is increasingly important to avoid downtime and machine failure. Different machine learning formulations can be used to solve the predictive maintenance problem. However, many of the approaches studied in the literature are not directly applicable to real-life scenarios. Indeed, many of those approaches usually either rely on labelled machine malfunctions in the case of classification and fault detection, or rely on finding a monotonic health indicator on which a prediction can be made in the case of regression and remaining useful life estimation, which is not always feasible. Moreover, the decision-making part of the problem is not always studied in conjunction with the prediction phase. This paper aims to design and compare different formulations for predictive maintenance in a two-level framework and design metrics that quantify both the failure detection performance as well as the timing of the maintenance decision. The first level is responsible for building a health indicator by aggregating features using a learning algorithm. The second level consists of a decision-making system that can trigger an alarm based on this health indicator. Three degrees of refinements are compared in the first level of the framework, from simple threshold-based univariate predictive technique to supervised learning methods based on the remaining time before failure. We choose to use the Support Vector Machine (SVM) and its variations as the common algorithm used in all the formulations. We apply and compare the different strategies on a real-world rotating machine case study and observe that while a simple model can already perform well, more sophisticated refinements enhance the predictions for well-chosen parameters.  ( 2 min )
    Learning spatiotemporal features from incomplete data for traffic flow prediction using hybrid deep neural networks. (arXiv:2204.10222v1 [cs.LG])
    Urban traffic flow prediction using data-driven models can play an important role in route planning and preventing congestion on highways. These methods utilize data collected from traffic recording stations at different timestamps to predict the future status of traffic. Hence, data collection, transmission, storage, and extraction techniques can have a significant impact on the performance of the traffic flow model. On the other hand, a comprehensive database can provide the opportunity for using complex, yet reliable predictive models such as deep learning methods. However, most of these methods have difficulties in handling missing values and outliers. This study focuses on hybrid deep neural networks to predict traffic flow in the California Freeway Performance Measurement System (PeMS) with missing values. The proposed networks are based on a combination of recurrent neural networks (RNNs) to consider the temporal dependencies in the data recorded in each station and convolutional neural networks (CNNs) to take the spatial correlations in the adjacent stations into account. Various architecture configurations with series and parallel connections are considered based on RNNs and CNNs, and several prevalent data imputation techniques are used to examine the robustness of the hybrid networks to missing values. A comprehensive analysis performed on two different datasets from PeMS indicates that the proposed series-parallel hybrid network with the mean imputation technique achieves the lowest error in predicting the traffic flow and is robust to missing values up until 21% missing ratio in both complete and incomplete training data scenarios when applied to an incomplete test data.  ( 2 min )
    Automated analysis of fibrous cap in intravascular optical coherence tomography images of coronary arteries. (arXiv:2204.10162v1 [cs.LG])
    Thin-cap fibroatheroma (TCFA) and plaque rupture have been recognized as the most frequent risk factor for thrombosis and acute coronary syndrome. Intravascular optical coherence tomography (IVOCT) can identify TCFA and assess cap thickness, which provides an opportunity to assess plaque vulnerability. We developed an automated method that can detect lipidous plaque and assess fibrous cap thickness in IVOCT images. This study analyzed a total of 4,360 IVOCT image frames of 77 lesions among 41 patients. To improve segmentation performance, preprocessing included lumen segmentation, pixel-shifting, and noise filtering on the raw polar (r, theta) IVOCT images. We used the DeepLab-v3 plus deep learning model to classify lipidous plaque pixels. After lipid detection, we automatically detected the outer border of the fibrous cap using a special dynamic programming algorithm and assessed the cap thickness. Our method provided excellent discriminability of lipid plaque with a sensitivity of 85.8% and A-line Dice coefficient of 0.837. By comparing lipid angle measurements between two analysts following editing of our automated software, we found good agreement by Bland-Altman analysis (difference 6.7+/-17 degree; mean 196 degree). Our method accurately detected the fibrous cap from the detected lipid plaque. Automated analysis required a significant modification for only 5.5% frames. Furthermore, our method showed a good agreement of fibrous cap thickness between two analysts with Bland-Altman analysis (4.2+/-14.6 micron; mean 175 micron), indicating little bias between users and good reproducibility of the measurement. We developed a fully automated method for fibrous cap quantification in IVOCT images, resulting in good agreement with determinations by analysts. The method has great potential to enable highly automated, repeatable, and comprehensive evaluations of TCFAs.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Revisiting Gaussian mixture critic in off-policy reinforcement learning: a sample-based approach. (arXiv:2204.10256v1 [cs.LG])
    Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value estimation.One major drawback of the C51 approach is its requirement of prior knowledge about the minimum andmaximum values a policy can attain as well as the number of bins used, which fixes the resolution ofthe distributional estimate. While the DeepMind control suite of tasks utilizes standardized rewards and episode lengths, thus enabling the entire suite to be solved with a single setting of these hyperparameters, this is often not the case. This paper revisits a natural alternative that removes this requirement, namelya mixture of Gaussians, and a simple sample-based loss function to train it in an off-policy regime. We empirically evaluate its performance on a broad range of continuous control tasks and demonstrate that it eliminates the need for these distributional hyperparameters and achieves state-of-the-art performance on a variety of challenging tasks (e.g. the humanoid, dog, quadruped, and manipulator domains). Finallywe provide an implementation in the Acme agent repository.  ( 2 min )
    IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages. (arXiv:2204.10195v1 [cs.CL])
    In recent years, there has been a lot of focus on offensive content. The amount of offensive content generated by social media is increasing at an alarming rate. This created a greater need to address this issue than ever before. To address these issues, the organizers of "Dravidian-Code Mixed HASOC-2020" have created two challenges. Task 1 involves identifying offensive content in Malayalam data, whereas Task 2 includes Malayalam and Tamil Code Mixed Sentences. Our team participated in Task 2. In our suggested model, we experiment with multilingual BERT to extract features, and three different classifiers are used on extracted features. Our model received a weighted F1 score of 0.70 for Malayalam data and was ranked fifth; we also received a weighted F1 score of 0.573 for Tamil Code Mixed data and were ranked eleventh.  ( 2 min )
    Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge. (arXiv:2204.10202v1 [cs.CL])
    Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.  ( 2 min )
    Is Neuron Coverage Needed to Make Person Detection More Robust?. (arXiv:2204.10027v1 [cs.CV])
    The growing use of deep neural networks (DNNs) in safety- and security-critical areas like autonomous driving raises the need for their systematic testing. Coverage-guided testing (CGT) is an approach that applies mutation or fuzzing according to a predefined coverage metric to find inputs that cause misbehavior. With the introduction of a neuron coverage metric, CGT has also recently been applied to DNNs. In this work, we apply CGT to the task of person detection in crowded scenes. The proposed pipeline uses YOLOv3 for person detection and includes finding DNN bugs via sampling and mutation, and subsequent DNN retraining on the updated training set. To be a bug, we require a mutated image to cause a significant performance drop compared to a clean input. In accordance with the CGT, we also consider an additional requirement of increased coverage in the bug definition. In order to explore several types of robustness, our approach includes natural image transformations, corruptions, and adversarial examples generated with the Daedalus attack. The proposed framework has uncovered several thousand cases of incorrect DNN behavior. The relative change in mAP performance of the retrained models reached on average between 26.21\% and 64.24\% for different robustness types. However, we have found no evidence that the investigated coverage metrics can be advantageously used to improve robustness.  ( 2 min )
    Evolution and use of data science vocabulary. How much have we changed in 13 years?. (arXiv:2204.10174v1 [cs.DL])
    Here I present an investigation on the evolution and use of vocabulary in data science in the last 13 years. Based on a rigorous statistical analysis, a database with 12,787 documents containing the words "data science" in the title, abstract or keywords is analyzed. It is proposed to classify the evolution of this discipline in three periods: emergence, growth and boom. Characteristic words and pioneering documents are identified for each period. By proposing the distinctive vocabulary and relevant topics of data science and classified in time periods, these results add value to the scientific community of this discipline.  ( 2 min )
    Detecting Topology Attacks against Graph Neural Networks. (arXiv:2204.10072v1 [cs.LG])
    Graph neural networks (GNNs) have been widely used in many real applications, and recent studies have revealed their vulnerabilities against topology attacks. To address this issue, existing efforts have mainly been dedicated to improving the robustness of GNNs, while little attention has been paid to the detection of such attacks. In this work, we study the victim node detection problem under topology attacks against GNNs. Our approach is built upon the key observation rooted in the intrinsic message passing nature of GNNs. That is, the neighborhood of a victim node tends to have two competing group forces, pushing the node classification results towards the original label and the targeted label, respectively. Based on this observation, we propose to detect victim nodes by deliberately designing an effective measurement of the neighborhood variance for each node. Extensive experimental results on four real-world datasets and five existing topology attacks show the effectiveness and efficiency of the proposed detection approach.  ( 2 min )
    Working memory inspired hierarchical video decomposition with transformative representations. (arXiv:2204.10105v1 [cs.CV])
    Video decomposition is very important to extract moving foreground objects from complex backgrounds in computer vision, machine learning, and medical imaging, e.g., extracting moving contrast-filled vessels from the complex and noisy backgrounds of X-ray coronary angiography (XCA). However, the challenges caused by dynamic backgrounds, overlapping heterogeneous environments and complex noises still exist in video decomposition. To solve these problems, this study is the first to introduce a flexible visual working memory model in video decomposition tasks to provide interpretable and high-performance hierarchical deep architecture, integrating the transformative representations between sensory and control layers from the perspective of visual and cognitive neuroscience. Specifically, robust PCA unrolling networks acting as a structure-regularized sensor layer decompose XCA into sparse/low-rank structured representations to separate moving contrast-filled vessels from noisy and complex backgrounds. Then, patch recurrent convolutional LSTM networks with a backprojection module embody unstructured random representations of the control layer in working memory, recurrently projecting spatiotemporally decomposed nonlocal patches into orthogonal subspaces for heterogeneous vessel retrieval and interference suppression. This video decomposition deep architecture effectively restores the heterogeneous profiles of intensity and the geometries of moving objects against the complex background interferences. Experiments show that the proposed method significantly outperforms state-of-the-art methods in accurate moving contrast-filled vessel extraction with excellent flexibility and computational efficiency.  ( 2 min )
    Robustness of Machine Learning Models Beyond Adversarial Attacks. (arXiv:2204.10046v1 [cs.LG])
    Correctly quantifying the robustness of machine learning models is a central aspect in judging their suitability for specific tasks, and thus, ultimately, for generating trust in the models. We show that the widely used concept of adversarial robustness and closely related metrics based on counterfactuals are not necessarily valid metrics for determining the robustness of ML models against perturbations that occur "naturally", outside specific adversarial attack scenarios. Additionally, we argue that generic robustness metrics in principle are insufficient for determining real-world-robustness. Instead we propose a flexible approach that models possible perturbations in input data individually for each application. This is then combined with a probabilistic approach that computes the likelihood that a real-world perturbation will change a prediction, thus giving quantitative information of the robustness of the trained machine learning model. The method does not require access to the internals of the classifier and thus in principle works for any black-box model. It is, however, based on Monte-Carlo sampling and thus only suited for input spaces with small dimensions. We illustrate our approach on two dataset, as well as on analytically solvable cases. Finally, we discuss ideas on how real-world robustness could be computed or estimated in high-dimensional input spaces.  ( 2 min )
    Fluctuation-based Outlier Detection. (arXiv:2204.10007v1 [cs.LG])
    Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with seven state-of-the-art algorithms on eight real-world tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection.  ( 2 min )
    Multi-Tier Platform for Cognizing Massive Electroencephalogram. (arXiv:2204.09840v1 [eess.SP])
    An end-to-end platform assembling multiple tiers is built for precisely cognizing brain activities. Being fed massive electroencephalogram (EEG) data, the time-frequency spectrograms are conventionally projected into the episode-wise feature matrices (seen as tier-1). A spiking neural network (SNN) based tier is designed to distill the principle information in terms of spike-streams from the rare features, which maintains the temporal implication in the nature of EEGs. The proposed tier-3 transposes time- and space-domain of spike patterns from the SNN; and feeds the transposed pattern-matrices into an artificial neural network (ANN, Transformer specifically) known as tier-4, where a special spanning topology is proposed to match the two-dimensional input form. In this manner, cognition such as classification is conducted with high accuracy. For proof-of-concept, the sleep stage scoring problem is demonstrated by introducing multiple EEG datasets with the largest comprising 42,560 hours recorded from 5,793 subjects. From experiment results, our platform achieves the general cognition overall accuracy of 87% by leveraging sole EEG, which is 2% superior to the state-of-the-art. Moreover, our developed multi-tier methodology offers visible and graphical interpretations of the temporal characteristics of EEG by identifying the critical episodes, which is demanded in neurodynamics but hardly appears in conventional cognition scenarios.  ( 2 min )
    A data filling methodology for time series based on CNN and (Bi)LSTM neural networks. (arXiv:2204.09994v1 [cs.LG])
    In the process of collecting data from sensors, several circumstances can affect their continuity and validity, resulting in alterations of the data or loss of information. Although classical methods of statistics, such as interpolation-like techniques, can be used to approximate the missing data in a time series, the recent developments in Deep Learning (DL) have given impetus to innovative and much more accurate forecasting techniques. In the present paper, we develop two DL models aimed at filling data gaps, for the specific case of internal temperature time series obtained from monitored apartments located in Bolzano, Italy. The DL models developed in the present work are based on the combination of Convolutional Neural Networks (CNNs), Long Short-Term Memory Neural Networks (LSTMs), and Bidirectional LSTMs (BiLSTMs). Two key features of our models are the use of both pre- and post-gap data, and the exploitation of a correlated time series (the external temperature) in order to predict the target one (the internal temperature). Our approach manages to capture the fluctuating nature of the data and shows good accuracy in reconstructing the target time series. In addition, our models significantly improve the already good results from another DL architecture that is used as a baseline for the present work.  ( 2 min )
    A Learned Index for Exact Similarity Search in Metric Spaces. (arXiv:2204.10028v1 [cs.DB])
    Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index has been explored actively to replace or supplement traditional index structures with machine learning models to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, a novel indexing approach called LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes for efficient similarity query processing in metric spaces. The underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on the disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.  ( 2 min )
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.  ( 2 min )
    Deep transfer learning for partial differential equations under conditional shift with DeepONet. (arXiv:2204.09810v1 [cs.LG])
    Traditional machine learning algorithms are designed to learn in isolation, i.e. address single tasks. The core idea of transfer learning (TL) is that knowledge gained in learning to perform one task (source) can be leveraged to improve learning performance in a related, but different, task (target). TL leverages and transfers previously acquired knowledge to address the expense of data acquisition and labeling, potential computational power limitations, and the dataset distribution mismatches. Although significant progress has been made in the fields of image processing, speech recognition, and natural language processing (for classification and regression) for TL, little work has been done in the field of scientific machine learning for functional regression and uncertainty quantification in partial differential equations. In this work, we propose a novel TL framework for task-specific learning under conditional shift with a deep operator network (DeepONet). Inspired by the conditional embedding operator theory, we measure the statistical distance between the source domain and the target feature domain by embedding conditional distributions onto a reproducing kernel Hilbert space. Task-specific operator learning is accomplished by fine-tuning task-specific layers of the target DeepONet using a hybrid loss function that allows for the matching of individual target samples while also preserving the global properties of the conditional distribution of target data. We demonstrate the advantages of our approach for various TL scenarios involving nonlinear PDEs under conditional shift. Our results include geometry domain adaptation and show that the proposed TL framework enables fast and efficient multi-task operator learning, despite significant differences between the source and target domains.  ( 2 min )
    Fairness in Graph Mining: A Survey. (arXiv:2204.09888v1 [cs.LG])
    Graph mining algorithms have been playing a significant role in myriad fields over the years. However, despite their promising performance on various graph analytical tasks, most of these algorithms lack fairness considerations. As a consequence, they could lead to discrimination towards certain populations when exploited in human-centered applications. Recently, algorithmic fairness has been extensively studied in graph-based applications. In contrast to algorithmic fairness on independent and identically distributed (i.i.d.) data, fairness in graph mining has exclusive backgrounds, taxonomies, and fulfilling techniques. In this survey, we provide a comprehensive and up-to-date introduction of existing literature under the context of fair graph mining. Specifically, we propose a novel taxonomy of fairness notions on graphs, which sheds light on their connections and differences. We further present an organized summary of existing techniques that promote fairness in graph mining. Finally, we summarize the widely used datasets in this emerging research field and provide insights on current research challenges and open questions, aiming at encouraging cross-breeding ideas and further advances.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )
    Perception Visualization: Seeing Through the Eyes of a DNN. (arXiv:2204.09920v1 [cs.CV])
    Artificial intelligence (AI) systems power the world we live in. Deep neural networks (DNNs) are able to solve tasks in an ever-expanding landscape of scenarios, but our eagerness to apply these powerful models leads us to focus on their performance and deprioritises our ability to understand them. Current research in the field of explainable AI tries to bridge this gap by developing various perturbation or gradient-based explanation techniques. For images, these techniques fail to fully capture and convey the semantic information needed to elucidate why the model makes the predictions it does. In this work, we develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM. Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to. Visualizations are obtained through a reconstruction model that inverts the encoded features, such that the parameters and predictions of the original models are not modified. Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available, thus easing the debugging and deployment of deep models as trusted systems.  ( 2 min )
    GUARD: Graph Universal Adversarial Defense. (arXiv:2204.09803v1 [cs.LG])
    Recently, graph convolutional networks (GCNs) have shown to be vulnerable to small adversarial perturbations, which becomes a severe threat and largely limits their applications in security-critical scenarios. To mitigate such a threat, considerable research efforts have been devoted to increasing the robustness of GCNs against adversarial attacks. However, current approaches for defense are typically designed for the whole graph and consider the global performance, posing challenges in protecting important local nodes from stronger adversarial targeted attacks. In this work, we present a simple yet effective method, named \textbf{\underline{G}}raph \textbf{\underline{U}}niversal \textbf{\underline{A}}dve\textbf{\underline{R}}sarial \textbf{\underline{D}}efense (GUARD). Unlike previous works, GUARD protects each individual node from attacks with a universal defensive patch, which is generated once and can be applied to any node (node-agnostic) in a graph. Extensive experiments on four benchmark datasets demonstrate that our method significantly improves robustness for several established GCNs against multiple adversarial attacks and outperforms existing adversarial defense methods by large margins. Our code is publicly available at https://github.com/EdisonLeeeee/GUARD.  ( 2 min )
    MedFACT: Modeling Medical Feature Correlations in Patient Health Representation Learning via Feature Clustering. (arXiv:2204.10011v1 [cs.LG])
    In healthcare prediction tasks, it is essential to exploit the correlations between medical features and learn better patient health representations. Existing methods try to estimate feature correlations only from data, or increase the quality of estimation by introducing task-specific medical knowledge. However, such methods either are difficult to estimate the feature correlations due to insufficient training samples, or cannot be generalized to other tasks due to reliance on specific knowledge. There are medical research revealing that not all the medical features are strongly correlated. Thus, to address the issues, we expect to group up strongly correlated features and learn feature correlations in a group-wise manner to reduce the learning complexity without losing generality. In this paper, we propose a general patient health representation learning framework MedFACT. We estimate correlations via measuring similarity between temporal patterns of medical features with kernel methods, and cluster features with strong correlations into groups. The feature group is further formulated as a correlation graph, and we employ graph convolutional networks to conduct group-wise feature interactions for better representation learning. Experiments on two real-world datasets demonstrate the superiority of MedFACT. The discovered medical findings are also confirmed by literature, providing valuable medical insights and explanations.  ( 2 min )
    fairDMS: Rapid Model Training by Data and Model Reuse. (arXiv:2204.09805v1 [cs.LG])
    Extracting actionable information from data sources such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming more challenging due to the fast-growing data generation rate. The rapid analysis possible with ML methods can enable fast feedback loops that can be used to adjust experimental setups in real-time, for example when errors occur or interesting events are detected. However, to avoid degradation in ML performance over time due to changes in an instrument or sample, we need a way to update ML models rapidly while an experiment is running. We present here a data service and model service to accelerate deep neural network training with a focus on ML-based scientific applications. Our proposed data service achieves 100x speedup in terms of data labeling compare to the current state-of-the-art. Further, our model service achieves up to 200x improvement in training speed. Overall, fairDMS achieves up to 92x speedup in terms of end-to-end model updating time.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.  ( 2 min )
    Scaling Language Model Size in Cross-Device Federated Learning. (arXiv:2204.09715v1 [cs.CL])
    Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a $21$M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with $\sim10\times$ smaller client-to-server communication cost and $11\%$ lower perplexity than smaller LSTMs commonly studied in literature.  ( 2 min )
    A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines. (arXiv:2204.09772v1 [cs.AI])
    A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.  ( 2 min )
    Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach. (arXiv:2204.09746v1 [cs.LG])
    The limited communication resources, e.g., bandwidth and energy, and data heterogeneity across devices are two of the main bottlenecks for federated learning (FL). To tackle these challenges, we first devise a novel FL framework with partial model aggregation (PMA), which only aggregates the lower layers of neural networks responsible for feature extraction while the upper layers corresponding to complex pattern recognition remain at devices for personalization. The proposed PMA-FL is able to address the data heterogeneity and reduce the transmitted information in wireless channels. We then obtain a convergence bound of the framework under a non-convex loss function setting. With the aid of this bound, we define a new objective function, named the scheduled data sample volume, to transfer the original inexplicit optimization problem into a tractable one for device scheduling, bandwidth allocation, computation and communication time division. Our analysis reveals that the optimal time division is achieved when the communication and computation parts of PMA-FL have the same power. We also develop a bisection method to solve the optimal bandwidth allocation policy and use the set expansion algorithm to address the optimal device scheduling. Compared with the state-of-the-art benchmarks, the proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets, i.e., MINIST and CIFAR-10, respectively. In addition, the proposed joint dynamic device scheduling and resource optimization approach achieve slightly higher accuracy than the considered benchmarks, but they provide a satisfactory energy and time reduction: 29% energy or 20% time reduction on the MNIST; and 25% energy or 12.5% time reduction on the CIFAR-10.  ( 2 min )
    Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation. (arXiv:2204.09801v1 [cs.LG])
    In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.  ( 2 min )
    Matching Writers to Content Writing Tasks. (arXiv:2204.09718v1 [cs.CL])
    Businesses need content. In various forms and formats and for varied purposes. In fact, the content marketing industry is set to be worth $412.88 billion by the end of 2021. However, according to the Content Marketing Institute, creating engaging content is the #1 challenge that marketers face today. We under-stand that producing great content requires great writers who understand the business and can weave their message into reader (and search engine) friendly content. In this project, the team has attempted to bridge the gap between writers and projects by using AI and ML tools. We used NLP techniques to analyze thou-sands of publicly available business articles (corpora) to extract various defining factors for each writing sample. Through this project we aim to automate the highly time-consuming, and often biased task of manually shortlisting the most suitable writer for a given content writing requirement. We believe that a tool like this will have far reaching positive implications for both parties - businesses looking for suitable talent for niche writing jobs as well as experienced writers and Subject Matter Experts (SMEs) wanting to lend their services to content marketing projects. The business gets the content they need, the content writer/ SME gets a chance to leverage his or her talent, while the reader gets authentic content that adds real value.  ( 2 min )
    Generative Pre-Trained Transformers for Biologically Inspired Design. (arXiv:2204.09714v1 [cs.CL])
    Biological systems in nature have evolved for millions of years to adapt and survive the environment. Many features they developed can be inspirational and beneficial for solving technical problems in modern industries. This leads to a novel form of design-by-analogy called bio-inspired design (BID). Although BID as a design method has been proven beneficial, the gap between biology and engineering continuously hinders designers from effectively applying the method. Therefore, we explore the recent advance of artificial intelligence (AI) for a computational approach to bridge the gap. This paper proposes a generative design approach based on the pre-trained language model (PLM) to automatically retrieve and map biological analogy and generate BID in the form of natural language. The latest generative pre-trained transformer, namely GPT-3, is used as the base PLM. Three types of design concept generators are identified and fine-tuned from the PLM according to the looseness of the problem space representation. Machine evaluators are also fine-tuned to assess the correlation between the domains within the generated BID concepts. The approach is then tested via a case study in which the fine-tuned models are applied to generate and evaluate light-weighted flying car concepts inspired by nature. The results show our approach can generate BID concepts with good performance.  ( 2 min )
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.  ( 2 min )
    FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow. (arXiv:2204.09679v1 [cs.CV])
    Super-resolution suffers from an innate ill-posed problem that a single low-resolution (LR) image can be from multiple high-resolution (HR) images. Recent studies on the flow-based algorithm solve this ill-posedness by learning the super-resolution space and predicting diverse HR outputs. Unfortunately, the diversity of the super-resolution outputs is still unsatisfactory, and the outputs from the flow-based model usually suffer from undesired artifacts which causes low-quality outputs. In this paper, we propose FS-NCSR which produces diverse and high-quality super-resolution outputs using frequency separation and noise conditioning compared to the existing flow-based approaches. As the sharpness and high-quality detail of the image rely on its high-frequency information, FS-NCSR only estimates the high-frequency information of the high-resolution outputs without redundant low-frequency components. Through this, FS-NCSR significantly improves the diversity score without significant image quality degradation compared to the NCSR, the winner of the previous NTIRE 2021 challenge.  ( 2 min )
  • Open

    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.  ( 2 min )
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.  ( 2 min )
    Strong posterior contraction rates via Wasserstein dynamics. (arXiv:2203.10754v2 [math.ST] UPDATED)
    In this paper, we develop a novel approach to posterior contractions rates (PCRs), for both finite-dimensional (parametric) and infinite-dimensional (nonparametric) Bayesian models. Critical to our approach is the combination of an assumption of local Lipschitz-continuity for the posterior distribution with a dynamic formulation of the Wasserstein distance, here referred to as Wasserstein dynamics, which allows to set forth a connection between the problem of establishing PCRs and some classical problems in mathematical analysis, probability theory and mathematical statistics: the Laplace method for approximating integrals, Sanov's large deviation principles in the Wasserstein distance, rates of convergence of the mean Glivenko-Cantelli theorem, and estimates of weighted Poincar\'e-Wirtinger constants. Under dominated Bayesian models, we present two main results: i) a theorem on PCRs for the regular infinite-dimensional exponential family of statistical models; ii) a theorem on PCRs for a general dominated statistical model. Some applications of our results are presented for the regular parametric model, the multinomial model, the finite-dimensional and the infinite-dimensional logistic-Gaussian model and the infinite-dimensional linear regression. In general, our results lead to optimal PCRs in finite dimension, whereas in infinite dimension it is shown how the prior distribution may affect PCRs. With regards to infinite-dimensional Bayesian models for density estimation, our approach to PCRs is the first to consider strong norm distances on parameter spaces of functions, such as Sobolev-like norms, as most of the approaches in the classical (frequentist) and Bayesian literature deal with spaces of density functions endowed with $\mathrm{L}^p$ norms or the Hellinger distance.  ( 2 min )
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.
    Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation. (arXiv:2203.00953v3 [math.ST] UPDATED)
    Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.
    Beyond the density operator and Tr(\rho A): Exploiting the higher-order statistics of random-coefficient pure states for quantum information processing. (arXiv:2204.10031v1 [quant-ph])
    Two types of states are widely used in quantum mechanics, namely (deterministic-coefficient) pure states and statistical mixtures. A density operator can be associated with each of them. We here address a third type of states, that we previously introduced in a more restricted framework. These states generalize pure ones by replacing each of their deterministic ket coefficients by a random variable. We therefore call them Random-Coefficient Pure States, or RCPS. We analyze their properties and their relationships with both types of usual states. We show that RCPS contain much richer information than the density operator and mean of observables that we associate with them. This occurs because the latter operator only exploits the second-order statistics of the random state coefficients, whereas their higher-order statistics contain additional information. That information can be accessed in practice with the multiple-preparation procedure that we propose for RCPS, by using second-order and higher-order statistics of associated random probabilities of measurement outcomes. Exploiting these higher-order statistics opens the way to a very general approach for performing advanced quantum information processing tasks. We illustrate the relevance of this approach with a generic example, dealing with the estimation of parameters of a quantum process and thus related to quantum process tomography. This parameter estimation is performed in the non-blind (i.e. supervised) or blind (i.e. unsupervised) mode. We show that this problem cannot be solved by using only the density operator \rho of an RCPS and the associated mean value Tr(\rho A) of the operator A that corresponds to the considered physical quantity. We succeed in solving this problem by exploiting a fourth-order statistical parameter of state coefficients, in addition to second-order statistics. Numerical tests validate this result.
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.
    Murmurations of elliptic curves. (arXiv:2204.10140v1 [math.NT])
    We investigate the average value of the $p$th Dirichlet coefficients of elliptic curves for a prime p in a fixed conductor range with given rank. Plotting this average yields a striking oscillating pattern, the details of which vary with the rank. Based on this observation, we perform various data-scientific experiments with the goal of classifying elliptic curves according to their ranks.
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.  ( 2 min )
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Path-Specific Objectives for Safer Agent Incentives. (arXiv:2204.10018v1 [cs.AI])
    We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )

  • Open

    I have a time series of time, temp, humidity, apparent temp, and ac/heater/fan state
    I want to create a NN that takes these readings and makes predictions about "what will the readings be in 5 minutes if I turn the AC on?". I'm thinking of training it with "the angle of the sun at that time", "temp", "humidity", and "ac/heater/fan state"(3) and then extracting data pairs spaced by 5 minutes where the system was in that state for the entire interval. Then I'm thinking I should use the 5-minute-later apparent temp as the training output. So the NN ultimately answers the question "what would be the apparent temperature if the system were to be in the given state for the next 5 minutes?" Am I on the right track here? submitted by /u/HasFiveVowels [link] [comments]  ( 1 min )
  • Open

    An AI painting some colorful pitbulls
    submitted by /u/p0goniphaft111 [link] [comments]
    Building A Pictionary App (sketch recognition model) with Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Is there a AI which I can use to edit images(selfies etc.)?
    Like I mark some areas of my pictures and then selcect what should happend with them? submitted by /u/xXLisa28Xx [link] [comments]
    What AI can I use to make caricatures from pictures from people?
    Is artbreeder the best way to do it, or is there a better way? submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 1 min )
    Learning or working with AI? Come join us, we are a Discord Community with over 20'000 members! Ask questions, find teammates, share your projects, attend events, and much more to come!
    Programming is way more fun when you learn/work with someone. Help each other, ask questions, brainstorm, etc. There is just so much benefit to joining a community when you are in this field, especially when you cannot find the question you are looking for on stack overflow! 😉 This is the same thing with AI, and it is why a little less than two years ago I created a discord server. Where anyone learning or working in the field could come and share their projects, learn together, work together, and much more. The community has now over 20 000 members, which is unbelievable! So glad to see it growing and see everyone so active. We also have an amazing partnership with an AI company coming that is super exciting for the community. You definitely want to be there to enjoy all the benefits they will give us. Come join us if you are in the field of AI ! https://discord.gg/learnaitogether submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Analyse sentiment/tonality in social networks
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 3 min )
  • Open

    [R] GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints
    https://arxiv.org/abs/2204.09123 https://www.researchgate.net/publication/360079336_GAMe_changer_or_not_An_evaluation_of_interpretable_machine_learning_models_based_on_additive_model_constraints submitted by /u/Positive_Ad_1090 [link] [comments]
    How can you differentiate Kornia SIFT descriptor? [P]
    Kornia is a differentiable library for computer vision based on PyTorch. Does anyone have experience with their SIFT descriptor. What can you differentiate? submitted by /u/avd4292 [link] [comments]
    [D] Evaluation and Selecting Models: Base on Loss or Metrics?
    When comes to evaluating and selecting a model, should one focus on minimizing loss (i.e., sparse categorical crossentropy) or obtain high rated metrics (i.e., f1)? Often time the model highest rated metrics would generate higher loss than ones with lower ratings in metrics during validation/test sets. Some would say focus on metrics as loss are for the machine to optimize learning, what stays in the training, stays in the training. However, wouldn't loss be also an important element to consider since it also describe the performance of the model, particularly when obtained from the test set? How should one prioritize? Metrics/loss rules all or seek for balance? submitted by /u/Hydraze [link] [comments]  ( 1 min )
    [R] Optimize clustering for downstream task
    Assume to have a 2-step algorithm: 1) aggregate data points into clusters 2) feed the clusters to a downstream task (e.g. classification, regression, etc). Is there any work that explores how to optimize the clustering in 1) to achieve the best performance in the downstream task 2)? One example would be a differentiable clustering algorithm that receives gradients from the downstream task or a parametrized clustering algorithm whose parameters are automatically tuned to increase the performance of the downstream task. I have found very little on this topic in the literature, could you point me to some relevant work? submitted by /u/fedetask [link] [comments]  ( 1 min )
    [D] What is a good emoji aware pre-trained language model?
    I am classifying social media posts (facebook, instagram), with emojis being upwards of 100% of content. For example, you may want to tag "🤮🤮🤮" as in need for moderation, and "🤔🤔🤔" as prioritized for a response. Looking for a good model to fine tune I found BerTweet, which seems at least somewhat emoji aware. However it also has a ton of out-of-vocabulary results, both for emoji and semi-common English words, despite it's liberal use of emoji.demojize and splitting up more complex emoji: ​ https://preview.redd.it/t6ai3o8le1v81.png?width=687&format=png&auto=webp&s=c16157addbe1b3d34858708f3e6c7517e64d26ec A model like `xlm-roberta-base with a larger vocabulary (250k) and more robust tokenization seems to have some 500 emoji directly in its vocabulary directly, without converting them to text. This seems potentially more promising, but also guarantees a token like 🤮 is just out of vocabulary rather than being interpreted by word pieces. Has anyone here had experience with dealing with emoji in text classification, and what approaches were most successful? submitted by /u/sanderbaduk [link] [comments]  ( 2 min )
    [D] What is the best method to use metric network at finetune after contrastive learning?
    Hi, I have a question about how to use metric network after contrastive learning. If I have trained a network well with NCELoss, I would like to finetune this network to match the best output by input(It used at calculating NCELoss). Is there any good way to do it? ​ Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] Opinions needed - Anyone interested in mock peer review?
    We’d like to know if anyone is interested in participating in a mock peer review? Basically if you have a paper you’d like to get feedback on, and would like to review others’ papers in exchange, you’re welcome to continue reading. We are gauging public interest in mock peer review and exploring the possibility to host the reviews on DouBlind. We’d like to know your answers to the following questions: Are you interested in mock peer review? Do you want to do this privately (paper and review are kept inside a small group) or openly (paper and review are open)? How many papers do you like to review? Do you have any concerns? submitted by /u/DouBlindDotCOM [link] [comments]  ( 2 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    This is reproduced from Zhihu and translated by DeepL, only used for enthusiasts to communicate. ​ MindSpore, as an end-to-edge cloud collaborative full-scenario AI open source framework, takes into account the flexibility of academic research and the high-performance needs of industry, supports end-to-edge cloud full-scene business, and brings developers a simpler programming, easier debugging, superior performance, and more flexible deployment experience, which has received widespread attention and application in the industry and has been open source on 2020.3.28, and is the Gitee The highest index of open source software. Welcome to participate in open source contributions, model crowdsourcing collaboration, industry innovation and application, algorithm innovation, academic collabora…  ( 3 min )
  • Open

    Data Pre-Processing in TF-Agents
    Hi everyone, this is my first post please go easy on me. I'm currently playing around with a bigger Model in tf-agents. I worked only with structured data (TF, SKlearn, Pandas...). Now I'm struggling a bit with the preprocessing and where in the architecture to place it. I use multiple Inputs and encoding layers for each of them. For the training of the Encoders I used some SKLearn pre-processor (StandardScaler, MinMaxScaler, KBinsDiscretizer). I try to reuse the pre-processing pipeline in the model or extract the information for other pre-processing mechanisms(e.g. pre-processing tf layers) My current options I came up with: Incorporate it directly into the Environment and return the pre-processed observation Pro easy, can probably use my SKlearn pipeline Contra I'd like to keep the architecture clean, so the environment should only give out raw values and not prepared values for a certain model Use an environment wrapper around the "raw" environment Pro "raw" environment needs no tuning Contra not sure if I can use my pipeline here and not sure if I'm taking a bad path here Use pre-processing TF layers Pro most of the API is there and can be used in my Encodernetworks, seems to be the TFic way Contra the SKlearn pre-processor have values for each column. The layers (e.g. rescale ) seems to only take one configuration for a whole tensor. I could probably create a layer for each value in the Tensor but that doesn't feel that it is supposed to be like that. If you have more options or can share your experience one of the options mentioned above I would be very glad. submitted by /u/Kjiessar [link] [comments]  ( 1 min )
    Masking in RNN in the actor network
    I am using PPO in the context of multi-agent RL. I was wondering if PyTorch has a way of handling when hidden states should be reinitialized to zeros. What I have found is this implementation: def forward(self, x, hxs, masks): if x.size(0) == hxs.size(0): x, hxs = self.rnn(x.unsqueeze(0), (hxs * masks.repeat(1, self._recurrent_N).unsqueeze(-1)).transpose(0, 1).contiguous()) x = x.squeeze(0) hxs = hxs.transpose(0, 1) else: # x is a (T, N, -1) tensor that has been flatten to (T * N, -1) N = hxs.size(0) T = int(x.size(0) / N) # unflatten x = x.view(T, N, x.size(1)) # Same deal with masks masks = masks.view(T, N) # Let's figure out which steps in the sequence have a zero for any agent # We will always assume t=0 has a zero in it as that makes the logic cleaner has_zeros = ((masks[1:] == 0.0) .any(dim=-1) .nonzero() .squeeze() .cpu()) # +1 to correct the masks[1:] if has_zeros.dim() == 0: # Deal with scalar has_zeros = [has_zeros.item() + 1] else: has_zeros = (has_zeros + 1).numpy().tolist() # add t=0 and t=T to the list has_zeros = [0] + has_zeros + [T] hxs = hxs.transpose(0, 1) outputs = [] for i in range(len(has_zeros) - 1): # We can now process steps that don't have any zeros in masks together! # This is much faster start_idx = has_zeros[i] end_idx = has_zeros[i + 1] temp = (hxs * masks[start_idx].view(1, -1, 1).repeat(self._recurrent_N, 1, 1)).contiguous() rnn_scores, hxs = self.rnn(x[start_idx:end_idx], temp) outputs.append(rnn_scores) # assert len(outputs) == T # x is a (T, N, -1) tensor x = torch.cat(outputs, dim=0) # flatten x = x.reshape(T * N, -1) hxs = hxs.transpose(0, 1) submitted by /u/No_Possibility_7588 [link] [comments]  ( 2 min )
    Does anyone know of a chess environment written in JAX?
    I don't think an opensource one exists but figured I'd ask here because you never know what's laying around the internet! As an aside, if one doesn't exist, let me know if you're interested in partnering in writing one! Edit: For anyone wondering I need the env to be in jax because my muzero implementation is in jax and I need the env to run on TPU cores, not CPU submitted by /u/evanatyourservice [link] [comments]  ( 1 min )
    Papers that use neural networks solely for planning in large MDPS (i.e., no learning)
    I am looking for any papers that do the following: use neural networks in the RL pipeline as the state space is too large for calculating the optimal policy using the traditional tabular value iteration or policy iteration. In this setting, the model is completely known, i.e., no learning. Most papers I see with DeepRL assume that the transition probabilities are unknown and that they have access to a simulator that gives them the ability to query data points. I am looking for existing work in DeepRL where the transition probabilities are known but the problem is intractable using tabular methods. Any direction would be appreciated, thanks! submitted by /u/lolillini [link] [comments]  ( 1 min )
    PPO update without using NNs / batch updates
    Hello, im making a new post as i couldnt find any answers to this before (although this reddit post is similar to my issue) I am trying to implement a simple multivariate Gaussian policy without neural networks, basically using a standard policy gradient update with SGD + score function gradient, without batches. The reason for this is to avoid unstable updates, meaning too large updates in mean/variance. The idea is thus to use a trust region update, to keep the updates within some reasonable size. I am a little confused regarding the maximization of the surrogate objective. As seen in this stackoverflow post, we wish to maximize [pi/pi_old] , compared to [log(pi)] in vanilla PG. Since i do not use automatic differentiation, but one single stochastic descent, how do I find the gradient of pi/pi_old ? To my understanding, the flow of the algorithm is this: sample experience -> compute new policy parameters -> compare with previous policy -> construct surrogate function -> perform SGD on surrogate to get the actual new policy It is the last step i am struggling with. submitted by /u/Acrobatic-Ad-9189 [link] [comments]  ( 1 min )
    Simulating robotic arm for object manipulation
    I'll be starting my work for object manipulation using deep RL, and i would like to get start from the scratch, please recommend the source, tools, and software used for this purpose. but not be working on modeling the robot, instead will be using any robot with gripper which can be interfaced with ROS. Also please link the github repositories which can be helpfull in the learning process Thanks submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    policy-encoding mapping implementation
    Hi, I want to check policy-encoding mapping e : (S → A) → R^k in Universal Successor Features Approximators. I don't know how to embedding network to another network. There are too many weights! Do you have any ideas? Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Useful Tools and Resources for Reinforcement Learning
    Found a useful list of Tools, Frameworks, and Resources for RL/ML. It covers Reinforcement learning, Machine Learning (TensorFlow & PyTorch), Core ML, Deep Learning, Computer Vision (CV). I thought I'd share it for anyone that's interested submitted by /u/Khaotic_Kernel [link] [comments]  ( 1 min )
  • Open

    Pix2Seq: A New Language Interface for Object Detection
    Posted by Ting Chen and David Fleet, Research Scientists, Google Research, Brain Team Object detection is a long-standing computer vision task that attempts to recognize and localize all objects of interest in an image. The complexity arises when trying to identify or localize all object instances while also avoiding duplication. Existing approaches, like Faster R-CNN and DETR, are carefully designed and highly customized in the choice of architecture and loss function. This specialization of existing systems has created two major barriers: (1) it adds complexity in tuning and training the different parts of the system (e.g., region proposal network, graph matching with GIOU loss, etc.), and (2), it can reduce the ability of a model to generalize, necessitating a redesign of the model for…  ( 7 min )
  • Open

    Secure AWS CodeArtifact access for isolated Amazon SageMaker notebook instances
    AWS CodeArtifact allows developers to connect internal code repositories to upstream code repositories like Pypi, Maven, or NPM. AWS CodeArtifact is a powerful addition to CI/CD workflows on AWS, but it is similarly effective for code-bases hosted on a Jupyter notebook. This is a common development paradigm for Machine Learning developers that build and train […]  ( 9 min )
  • Open

    Web Frameworks for Your Python Projects
    When we finished a Python project and roll it out for other people to use it, the easiest is to present our project as a command line program. If you want to make it friendlier, you may want to develop a GUI for your program so people can interact with the program with mouse clicks […] The post Web Frameworks for Your Python Projects appeared first on Machine Learning Mastery.  ( 36 min )
  • Open

    7 Ways Your Business Can Plan For Artificial Intelligence
    Artificial Intelligence is all over the world today. From the use of virtual assistants like Siri, Alexa, or Cortana, to improving…  ( 2 min )
  • Open

    By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet
    Different parts of the globe are experiencing distinct climate challenges — severe drought, dangerous flooding, reduced biodiversity or dense air pollution. The challenges are so great that no country can solve them on their own. But innovative startups worldwide are lighting the way, demonstrating how these daunting challenges can be better understood and addressed with Read article > The post By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    Last Week in AI: Chip Startup Funding Doubled, Google Text+Image Search, Analog AI, Criminal Robotaxi
    submitted by /u/regalalgorithm [link] [comments]
    AI Dream 31 - Spaceships Galore Planet VQGAN CLIP
    submitted by /u/LordPewPew777 [link] [comments]
    What would be the best approach to auto-generate comic panels (Garfield style) with drawings and speech bubbles, assuming I have tons of scans to use as training?
    I'm a software developer but I'm not really experienced in AI. Would it be best to train first for speech bubbles and separately for panel drawings? What kind of network is the best for this? Just thinking that it would be a cool project to have auto generated legible infinite comic strips for a semi niche comic strip that runs in my country. submitted by /u/dananite [link] [comments]  ( 1 min )
    Any Recommendations for AI Content Generation Software?
    Content generation is such a time-suck for small businesses, and it seems like an interesting vertical to apply AI. The AI would generate the content after being given a prompt. There are already a few tools trying this, but the quality doesn't seem to be very high. Are there better tools that I'm missing, or is the consumer-facing software so early-stage that it would be better to hire a data scientist and train an AI system specifically for this purpose? https://www.reddit.com/r/MachinesWrite/comments/f45eav/list_of_ai_text_generators/?utm_source=share&utm_medium=web2x&context=3 https://www.reddit.com/r/juststart/comments/axa8w3/ai_ml_text_generators/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/CliffWoolum [link] [comments]  ( 1 min )
    Looking for enterprise conversational AI platform
    submitted by /u/sunstormfirefall [link] [comments]  ( 1 min )
    VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    Microsoft AI Researchers Develop ‘Ekya’ To Address The Problem Of Data Drift On The Edge Compute Box And Enables Both Retraining And Inference To Co-Exist On It
    Deep neural network (DNN) models for object recognition and classification, such as Yolo, ResNet, and EfficientNet, are used in video analytics applications such as urban mobility and smart automobiles. There is a symbiotic link between edge computing and video analytics, claiming that live video analytics is the “killer app” for edge computing. Edge devices come in various sizes and designs, but they are always resource-constrained compared to the cloud. Video analytics deployments send the videos to on-premises edge servers. The article handles the difficulty of supporting inference and retraining jobs on edge servers simultaneously, which necessitates navigating the fundamental tradeoff between the accuracy of the retrained model and the accuracy of the inference. Edge computation is preferred for video analytics because it eliminates the need for expensive network lines to broadcast videos to the cloud while simultaneously preserving video privacy. Edge computation has a finite amount of resources (e.g., with weak GPUs). The mismatch between the increasing rate of model compute needs, and the total cycles of processors exacerbate this problem. As a result, model compression is used in edge deployments. Continue reading our bite on this research Paper: https://www.microsoft.com/en-us/research/uploads/prod/2021/07/nsdi22spring-final74.pdf Github: https://github.com/edge-video-services/ekya submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Is there a AI which is able to turn normal videos into sketches like the video below?
    submitted by /u/TheblackRook3 [link] [comments]  ( 1 min )
    Automatic Summaries of your Documents in Google Docs !
    submitted by /u/OnlyProggingForFun [link] [comments]
    How to achieve a training duration on MindSpore that's less than or equal to that on TensorFlow?
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 1 min )
    CypherZilla - The First Encoded NFT Made By AI To Support Trump. Upvote If You Want To Have A Huge Impact!
    CypherZilla on OpenSea https://reddit.com/link/u8he59/video/i9a29i5ywtu81/player submitted by /u/thecypherbeast [link] [comments]
    What price we have to pay for the progress in AI, have a look-
    https://www.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    Collaboration AI video and music
    submitted by /u/Recent_Coffee_2551 [link] [comments]
  • Open

    Question about trained models
    Hello I have a question. For example, in the case of an inverted pendulum or cartpole, I train the model for the pole to be at 0 degrees (vertical) and it works. Then I want this same model to keep the pole at another position, for example, 3 degrees, do I have to train this model again for achieving this to or can I somehow use the model I already trained and what it learnt and input the new position I want it to be? idk if I explained myself I guess its mostly doubts about how to interact with the model and how to properly use a model that has already been trained. If anyone has some example of code (python, gym), on interacting with a trained model it would be really helpful. submitted by /u/Sleyck [link] [comments]  ( 1 min )
    Why is this implementation of PPO using a replay buffer?
    https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/algorithms/r_mappo/r_mappo.py submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    What is the role of masks in the computation of GAE?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Question About Optimal Policy Guarantees in POMDPs
    I'm working on a project where I'm trying to prove the existence of a particular set of functions by showing it can be constructed as the solution to a Markov Decision Process. However, it seems that it's much simpler to convert it to a partially observable MDP, rather than a classic one. I know it's been proven that the set of optimal policies for a classic MDP is nonempty, and intuitively I feel like the same should hold for POMDPs, but I'm having a hard time finding a particular source proving such a thing. Does anyone know where I ought to look? submitted by /u/LessPoliticalAccount [link] [comments]  ( 1 min )
    Can reinforcement learning learn itself? A reply to 'Reward is enough' (PDF)
    submitted by /u/JBaloney [link] [comments]  ( 1 min )
    What is this line in the Sutton/Barto textbook referring to?
    In the first edition of the textbook, the section on actor-critic methods (link) describes the classical approach of using the temporal difference error 𝛿 to modify the probability of selecting action a in state s: https://preview.redd.it/fa35vut7ewu81.png?width=238&format=png&auto=webp&s=c1b8952b065a90ecd2b8c0c30b985e36d37dbc30 Then they briefly mention that one variation on the classical approach is to scale temporal difference error 𝛿 by the inverse of the probability of selecting the action a, where that probability is given by 𝜋(s, a): https://preview.redd.it/69gd3zmbdwu81.png?width=375&format=png&auto=webp&s=b66a7d5eef3c2b7bc256473aed728223921a751c They say: " These issues were explored early on, primarily for the immediate reward case (Sutton, 1984; Williams, 1992) and have not been brought fully up to date." This idea is relevant to a project I'm working on, and I'd like to read more about it. But the references seem to be dead ends: Sutton 1984 is his PhD thesis, which I can't find a digital copy of, and Williams 1992 is this paper, which doesn't seem to contain this idea. Also this section doesn't seem to appear in the second edition of the textbook. You folks are much smarter than I am: Does modifying the update in this way mean anything to you? Are there modern approaches that do something like this? Or should I assume it was a little-explored idea in the early days that has been more-or-less forgotten? Thanks very much! submitted by /u/Careless-Argument-37 [link] [comments]  ( 2 min )
    Reinforcement Learning with delays
    I was wondering what methods there are for RL with time delay other than augmenting the state space with the action buffer or using a model to undelay the environment. I've seen this post How to deal with the time delay in reinforcement learning? - Artificial Intelligence Stack Exchange however it's rather brief and I wondered if there were any more recent advancements. I am also struggling to understand partial trajectory resampling ( 2010.02966.pdf (arxiv.org) ) and the code in the accompanying repo. GitHub - rmst/rlrd: PyTorch implementation of our paper Reinforcement Learning with Random Delays (ICLR 2020) I was wondering how we can resample actions in environments with constant delays if those actions are used in the state space for all subsequent chosen actions? submitted by /u/SuperDuperDooken [link] [comments]  ( 1 min )
    Is it stupid to use rl to control solar panel angle?
    submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    How can I use the environment in Emergence of Locomotion Behaviours in Rich Environments?
    Hi, I want to train my agent in the environment used in "Emergence of Locomotion Behaviours in Rich Environments". Here is a video about that https://www.youtube.com/watch?v=hx_bgoTF7bs. Is the environment released? Thanks for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    [P] mGPT model released: a multilingual gpt-3-like model for 61 language
    Hi everyone. Today we released the mGPT model: multilingual generative pre-trained transformer The checkpoints are available on Huggingface model page The example usage is at the Github repo https://github.com/ai-forever/mgpt The model has 1.3 billion parameters The context length is 512 tokens. The model can generate sequences after the input prompt, can be used for fine-tuning or for zero- and few-shot learning: from transformers import GPT2LMHeadModel, GPT2Tokenizer model_name = "sberbank-ai/mGPT" tokenizer = GPT2Tokenizer.from_pretrained(model_name) model = GPT2LMHeadModel.from_pretrained(model_name) model.cuda() model.eval() texts = [ "My favourite holiday is ", "Իմ սիրելի տոնն է ", "Моє улюблене свято ", "mi fiesta favorita es ", "मेरी पसंदीदा छुट्टी है", "我最喜欢的节日是", "Min…  ( 2 min )
    [P] Deep Learning GPU Benchmark: A Latency-Based Approach
    Hi r/MachineLearning! I want to share with you a fun side project of mine on benchmarking the GPUs for deep learning: [project page]. https://preview.redd.it/7olwqyze5yu81.png?width=2041&format=png&auto=webp&s=25aecb9733366720a2be5cecc2048eb2a734c9b9 Here are some key features: It helps to estimate the runtime of algorithms on a different GPU. It measures GPU processing speed independent of GPU memory capacity. It contains adjustable weightings through interactive UIs. The project page also explains how this benchmark differs from existing ones, and why this benchmark is more relevant to academic research. I would love to know what you think! submitted by /u/roll-a-dice [link] [comments]  ( 1 min )
    [D] What's your perfect laptop for deep learning research?
    I'm using mbp 2015, it's a pretty solid laptop, I like it a lot, though it feels slow and I've started to look for a replacement. Given that I run all experiment on gpu dedicated servers, my laptop serves me as a typewriter, it's ok, but I'd like to get more out of it. Frankly I'm a bit disappointed by 2021 Macbooks, hope they'll be improved in 2022. Recently lambda labs together with razer announced their tensorbook https://lambdalabs.com/deep-learning/laptops/tensorbook , their pricing looks weird to me, the more you pay the more years of support you have, that's the only thing which differentiates base bundle from enterprise. Also there is no option to customize hardware for it, though basic bundle itself looks ok, its price is $3500 like M1 Max's. What's your opinion about this laptop in particular? would you buy it? generally this laptop looks like a cool thing to have for local model development even from a tent somewhere in Nepal, given that you have enough power banks to charge it. :) What's your choice of a laptop for DL? My biggest requirement is a durable laptop which will serve at least 5 years, better with NVIDA GPU for development and debugging. submitted by /u/taras-sereda [link] [comments]  ( 1 min )
    [R] Deep models of superficial face judgments (PNAS)
    ​ Transformations that alter the perception of target faces Paper: https://www.pnas.org/doi/10.1073/pnas.2115228119 Dataset: https://onemillionimpressions.com/ submitted by /u/joshuacpeterson [link] [comments]
    [R] Planting Undetectable Backdoors in Machine Learning Models
    submitted by /u/Wiskkey [link] [comments]
    [P] VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    [P] Announcing cleanlab 2.0: Automatically Find Errors in ML Datasets
    Hi folks. This morning I released the new cleanlab 2.0 Python package for automatically finding errors in datasets and machine learning/analytics with real-world, messy data and labels. tl;dr - cleanlab provides a framework to streamline data-centric AI. https://preview.redd.it/hq1kyasvwwu81.png?width=2279&format=png&auto=webp&s=4fa3c82ec66d685c8fc4f95c5d9a0fc4be192d6b After 1.0 launch last year, engineers used cleanlab at Google to clean and train robust models on speech data), at Amazon to estimate how often the Alexa device doesn’t wake, at Wells Fargo to train reliable financial prediction models, and at Microsoft, Tesla, Facebook, etc. Joined by two good friends from grad school, we completely rebuilt cleanlab 2.0 to work for all data scientists, ML datasets, and models; and hit a…  ( 2 min )
    [P] Galp Hackathon - Win 10.000€ from home!
    If you are passionate about Data & AI we have the perfect challenge for you! The applications for Galp’s Hackathon Retail 4.0 are OPEN! With this Hackathon, Galp is challenging the community to propose solutions to specific problems and use cases that they think could improve their typical customer journey in the service stations. Gather a team and come up with an innovative solution for a chance of winning 10.000€! Let’s shape the future of Galp's retail? Apply now: https://taikai.network/en/galp/hackathons/retail40 https://preview.redd.it/wkfb6ybuwwu81.png?width=3334&format=png&auto=webp&s=deef13767df5ba607e387ce4e278ae3981d93582 submitted by /u/migueldsalmeida [link] [comments]  ( 1 min )
    [D] Imbalanced multi class classification 📌
    I'm working on a Machine Learning problem for multi class classification with imbalanced classes distribution, so obviously my model favours classes with more data and fails to predict classes with few data, what are the techniques I can use to help the model distinguish all the classes the same way ? P.S I'm avoiding to use SMOTE method to train the model on real used data rather than generated submitted by /u/According-Promise-23 [link] [comments]  ( 2 min )
    [R] CVPR 2022 - Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing
    submitted by /u/SleekEagle [link] [comments]
    [R] My continuously updated machine learning research notes
    Dear ML researchers, For the past many years, I've been updating my machine learning research notes for my PhD students and everyone online continuously. I don't like uploading to arxiv to get "citations", and GitHub serves me well: Hope they are useful for you: https://github.com/roboticcam/machine-learning-notes Richard, submitted by /u/MLknowledge [link] [comments]  ( 1 min )
    [D] Correcting for imbalance in regression datasets
    Hi, I am performing a Image --> scalar regression. The output scalar I am trying to estimate follows a roughly Gaussian distribution. I notice that the DNN output is biased to output values towards the mean (makes sense). ​ This seems like a problem of imbalanced data. For classification, I can oversample minority classes. What is the equivalent for regression? Is there an equivalent technique for regression where we oversample "outliers" and undersample central values. submitted by /u/rsandler [link] [comments]  ( 1 min )
    Building Dense Passage Retrievers [P]
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] How do you usually run sanity checks when training GANs ?
    Hi, I have been studying super-resolution with gans and took a look at SRGAN et ESRGAN. I have spent the whole day running experiments in order to find if I can manage to overfit on a single batch of 16 / 32 / 128 examples (MNIST). I have found out that it's almost impossible to use this tactic as a sanity check because it simply cannot generate good quality samples. I would like to know what are your thoughts on this, and how you would run sanity checks regarding GANs. ​ Thank you ! submitted by /u/Frizzoux [link] [comments]  ( 1 min )
    [D] Amazon Releases a New Multilingual Dataset for NLU
    https://www.amazon.science/blog/amazon-releases-51-language-dataset-for-language-understanding submitted by /u/__lawless [link] [comments]
    [D] How to handle features that apply to a whole csv-file vs single rows?
    Hi all, I have csv-files (~300) with a fixed set of columns (~40) but varying number of rows (sum of all rows ~300 000) and multiple labels per csv that I want to predict. Because of the limited number of csv-files and as a first try I am predicting the labels row-wise (attaching the label to all the rows of one csv-file) which works well for some labels but not for others. Currently, I am calculating some features for every row and just appending them to the row and some features for the whole csv-file and appending them to every row. Two problems are now arising that I would like to hear some input about: The number of features per csv is growing and it seems like a waste to copy them to every row. For some labels it is probably reasonable to throw away most of the rows and only feed in a handful. How would you design a structure that incorporates the limited number of csv-files and the different ways to treat features (row vs. csv)? submitted by /u/tlklk [link] [comments]  ( 2 min )
    [R][P] Differences in publishing a paper at a conference and in a journal?
    Hi! I am an undergrad and I am going to start my MS in CS this fall. My research interest is mainly in Multimodal Learning for language and Speech. I have written papers before but both my papers have been peer reviewed journal papers (Knowledge-Based Systems, Elsevier) [1] [2] I now want to start publishing papers in conferences since I have noticed that it is much easier to get noticed and recieve reviews when the paper is presented at a conference. I want to understand how different is the publication process for conferences? I also wanted recommendations on conferences in the NLP and Speech area, considering this will be my first conference paper. Thanks! (I would also appreciate reviews on my papers if anyone has the time to look them over. Thanks!) submitted by /u/prabhav55221 [link] [comments]  ( 4 min )
    [D] How do you get the maximum of arxiv sanity?
    Basically, I don't want to phrase this as a a "how-to" post but arxiv-sanity-lite really bothers me. How do you guys find recent papers in your area of interest which are promising besides following what is published at major conferences? I believe the website is "too lightweight". For example, what if I am interested in computer vision papers and I specify that in the tags field (i.e. explicitly typing "computer vision"). How can I list the papers based on a score (basically goodness of the paper)? Why does using shortcuts (basically links) like `````recommend over last week or recommend over last 3 days always (at least for me) end up with 0 results? I've never used the original arxiv-sanity before so I strongly believe that there is something that I am missing. submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 1 min )
    [N] New opportunity: PhD Candidate within multisensor data fusion and applied machine learning for analysis of Arctic sea ice
    The Norwegian University of Science and Technology (NTNU) has a vacancy for PhD Candidate within the DIGITALSEAICE project . The project aims to build a multi-scale digital infrastructure that integrates local and regional sea ice models for improved forecasting and understanding of variations in polar ice conditions. More information here: https://www.jobbnorge.no/en/available-jobs/job/224802/ submitted by /u/KatjaKim [link] [comments]  ( 1 min )
    [P] Efficient Deep Learning Book
    We are working on a book that focuses on deep learning efficiency techniques such as quantization, pruning, distillation, etc. for both server-side as well as on-device (smartphones, IoT, etc.) applications. The goal is to introduce these ideas in a single place, without having to parse many papers, try to get a working code sample, and then spend time debugging. With the accompanying codelabs, we hope that our readers can make their models 4-20x smaller, faster, and better in quality. We have released the first four chapter's draft PDFs, and would truly appreciate any sort of comments / feedback. Book: efficientdlbook.com Feedback: hello@efficientdlbook.com submitted by /u/EfficientDLBook [link] [comments]  ( 1 min )
    [D] Interview w/ Google Brain researchers on Sparse Expert Models (Switch Transformers, GLAM, and more...)
    https://youtu.be/ccBMRryxGog This video is an interview with Barret Zoph and William Fedus of Google Brain about Sparse Expert Models. Sparse Expert models have been hugely successful at distributing parts of models, mostly Transformers, across large array of machines and use a routing function to effectively route signals between them. This means that even though these models have a huge number of parameters, the computational load for a given signal does not increase because the model is only sparsely activated. Sparse expert models, such as Switch Transformers and GLAM can scale up to trillions of parameters and bring a number of desirable properties. We discuss everything from the fundamentals, history, strengths and weaknesses, up to the current state of the art of these models. ​ OUTLINE: 0:00 - Intro 0:30 - What are sparse expert models? 4:25 - Start of Interview 5:55 - What do you mean by sparse experts? 8:10 - How does routing work in these models? 12:10 - What is the history of sparse experts? 14:45 - What does an individual expert learn? 19:25 - When are these models appropriate? 22:30 - How comparable are sparse to dense models? 26:30 - How does the pathways system connect to this? 28:45 - What improvements did GLAM make? 31:30 - The "designing sparse experts" paper 37:45 - Can experts be frozen during training? 41:20 - Can the routing function be improved? 47:15 - Can experts be distributed beyond data centers? 50:20 - Are there sparse experts for other domains than NLP? 52:15 - Are sparse and dense models in competition? 53:35 - Where do we go from here? 56:30 - How can people get started with this? ​ Papers: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https://arxiv.org/abs/2101.03961) GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (https://arxiv.org/abs/2112.06905) Designing Effective Sparse Expert Models (https://arxiv.org/abs/2202.08906) submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [P] Interactive semantic map of ICLR 2022
    Next week ICLR 2022 is taking place. Fully virtual and 1000+ high quality papers. To make sense of this volume of papers we have indexed the papers and provide an interactive semantic map of #ICLR2022, check out: https://search.zeta-alpha.com/?q=&d=ly&doc_sources=ICLR&sort_by=authority To enjoy the full map, click on [Explore more] and then enter full screen mode. We will also discuss the program and 10 must read papers in the Zeta Alpha "Trends in AI" ICLR edition webinar Monday 25th, for which you can sign up here. https://us06web.zoom.us/webinar/register/7816505274568/WN_82DzwhXZQbOCSTWgaI9xMw Looking forward to meet you online at ICLR 2022! https://preview.redd.it/6wdqj4ru7uu81.jpg?width=2202&format=pjpg&auto=webp&s=c97417c9ea39919041949bf3aa38ad33bb6eca5a submitted by /u/EngineerZetaAlpha [link] [comments]  ( 1 min )
    [R] MindSpore Paper Interpretation: MIEHDR CNN: Main Image Enhancement based Ghost-Free High Dynamic Range Imaging using Dual-Lens Systems
    This article is reproduced from Zhihu and translated by DeepL for enthusiasts to communicate. 1. Research Background High dynamic range images (HDR) are mainly oriented to picture display technology. In a certain scene, if the range of high and low luminance areas exceeds the maximum luminance range of the image, the display effect will be greatly reduced, and HDR is to better solve this problem, it can record a broader range of luminance images, so as to obtain a more effective display effect. The current solution to the problem of generating high dynamic range images (HDR) focuses on the fusion of two low dynamic range (LDR) images of different exposures taken with the same camera. In such a solution by the camera shake or object movement during the exposure time to produce the proble…  ( 3 min )
    [D] Most efficient way to use large image datasets with clusters for ML?
    I am having trouble finding some general information on this subject. I know I am down the rabbit hole when google doesn't have an answer. I want to know best practices and information on using clusters for machine learning with large amounts of data. I believe I have a close to an optimal solution but wanted to get some other opinions on the subject. My current setup: AWS EKS Kubernetes for a cluster Kubeflow for ML platform Katib for HPT jobs Pytorch for custom models Spot instance GPUs Lustre for file serving to the models My Data: Millions of Images stored in S3 ~50TB of data What is the most efficient way to move my data to the cluster? My current approach: Preprocess the data with a dedicated instance and store it in S3 Master runs on a dedicated node Katib spins up a set number of GPU spot nodes A claim is made, and an FSx Lustre system is generated for the pod Advantages: Very fast training and data movement with spot training Disadvantages: I have to spin up several Lustre systems for the training Preprocess the data with a dedicated instance and store it in S3 Possible alternative Same as above but use EFS as a distributed file system so I don't have to wait for Lustre Advantages: Potentially cheaper as I have only one FS Disadvantages: Slow throughput, read this was a bad idea Master runs on a dedicated node Other alternatives UseKatib spins up a PyTorch streaming function with S3(boosted transfer speed)set number of GPU spot nodes Every pod starts a claim is made and downloads data to an EBS Give up and switch to SageMakerFSx Lustre system is generated for the pod Anyone with experience in these technologies I would really appreciate hearing your thoughts. submitted by /u/thewineiswater [link] [comments]  ( 1 min )
    [D] [P] Neural network: same prediction for different inputs
    I am getting the same prediction for different inputs. I am trying to use a regressional neural network. Since data is huge, I am training one example at a time. Here is a simplified version of my code. model = Sequential() model.add(Dense(10000, input_dim=212207, kernel_initializer='normal', activation='relu')) model.add(Dense(100, activation='relu')) model.add(Dense(1, kernel_initializer='normal')) model.compile(loss='mean_squared_error', optimizer='adam') for i in range(10000000): #X is input with 212207 values #Y is a output value if i<6000000: model.fit(X.transpose(), Y, epochs=30, batch_size=1, verbose=0) else: prediction=model.predict(X.transpose()) I made sure that I am training on different examples and trying predictions on different examples. I am still getting the same prediction value for all testing inputs. I think I made some mistake in defining the model for regression neural network. Can you please check if the code is correct? submitted by /u/exoplanet_hunter [link] [comments]  ( 1 min )
  • Open

    Fixed points of bilinear transformations
    Introduction I was puzzled the first time I saw bilinear transformations, also known as Möbius transformations. I was in a class where everything had been abstract and general, and suddenly thing got very concrete and specific. I wondered why we had changed gears, and I wondered how there could be much to say about something […] Fixed points of bilinear transformations first appeared on John D. Cook.  ( 2 min )
    Partitioning complexity
    This post looks at how to partition complexity between definitions and theorems, and why it’s useful to be able to partition things more than one way. Quadratic equations Imagine the following dialog in an algebra class. “Quadratic equations always have two roots.” “But what about (x – 5)² = 0. That just has one root, […] Partitioning complexity first appeared on John D. Cook.  ( 4 min )
  • Open

    Hidden Interfaces for Ambient Computing
    Posted by Alex Olwal, Research Scientist, Google Augmented Reality and Artem Dementyev, Hardware Engineer, Google Research As consumer electronics and internet-connected appliances are becoming more common, homes are beginning to embrace various types of connected devices that offer functionality like music control, voice assistance, and home automation. A graceful integration of devices requires adaptation to existing aesthetics and user styles rather than simply adding screens, which can easily disrupt a visual space, especially when they become monolithic surfaces or black screens when powered down or not actively used. Thus there is an increasing desire to create connected ambient computing devices and appliances that can preserve the aesthetics of everyday materials, while providing …  ( 7 min )
  • Open

    Specify and extract information from documents using the new Queries feature in Amazon Textract
    Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract now offers the flexibility to specify the data you need to extract from documents using the new Queries feature within the Analyze Document API. You don’t need to know the structure of the […]  ( 11 min )
  • Open

    Understanding the Difference between Loss Functions and Metrics in Machine Learning/Deep Learning
    Yes! You read the heading right. There’s indeed a difference between loss functions and Metrics in the field of Machine Learning. However…  ( 2 min )
  • Open

    A new state of the art for unsupervised vision
    MIT CSAIL scientists created an algorithm to solve one of the hardest tasks in computer vision: assigning a label to every pixel in the world, without human supervision.  ( 7 min )
    Anticipating others’ behavior on the road
    A new machine-learning system may someday help driverless cars predict the next moves of nearby drivers, cyclists, and pedestrians in real-time.  ( 7 min )
  • Open

    Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors
    Your next trip to the dentist might offer a taste of AI. Pearl, a West Hollywood startup, provides AI for dental images to assist in diagnosis. It landed FDA clearance last month, the first to get such a go-ahead for dentistry AI. The approval paves the way for its use in clinics across the United Read article > The post Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors appeared first on NVIDIA Blog.  ( 4 min )
    GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW
    The gods must be smiling this GFN Thursday — God of War today joins the GeForce NOW library. Sony Interactive Entertainment and Santa Monica Studios’ masterpiece is available to stream from GeForce NOW servers, across nearly all devices and at up to 1440p and 120 frames per second for RTX 3080 members. Get ready to Read article > The post GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Building Dense Passage Retrievers
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    NN from Scratch: #4 Backward Propagation | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    Searching for volunteers for ML-based Ukrainian volunteer project.
    We are searching for trustworthy volunteers with some free time who would like to contribute to a digital Ukrainian volunteer project. Our system heavily relies on an image recognition system with a number of specialized filters involvg facial recognition, object recognition, logo detection, photoshop detection etc. People with professional experience with any of these things is preferred, but novice ML people are welcome to join us in a different capacity. DM to learn more about the project, glad to discuss the details with you. submitted by /u/eelgirl [link] [comments]  ( 1 min )
  • Open

    18 Differences Between Good and Great Data Scientists
    If you are employed as a data scientist and have survived (or strived!) in your position for more than a year, chances are you are at least a good data scientist. This is particularly true if you were promoted. The difference between a mediocre and a good data scientist will be the topic of a… Read More »18 Differences Between Good and Great Data Scientists The post 18 Differences Between Good and Great Data Scientists appeared first on Data Science Central.  ( 6 min )

  • Open

    Is there any difference between how DDPG and PPO use the replay buffer?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Any tips for a prospective graduate student in Reinforcement Learning?
    Hello Everyone, I apologize ahead of time if posts like this aren't looked well upon on this sub, but I couldn't find rules against this and I also think this is the best, most niche sub for my question. I also made a new account just to be safe haha. ​ Anyways, I will be graduating this spring with a BS in Computer Science and a BA in Mathematics. I have been researching Machine Learning since my sophomore year (adversarial machine learning) under a professor at my university and recently took upon a second, concurrent research position in RL since last summer. ​ My goal is to get into a PhD program at a higher level than my current university (my current university is good, but doesn't really have much of an AI focus as I've already taken all the AI grad courses as an undergrad). I'm…  ( 3 min )
    Task Allocation problem with graph representation
    Hey everyone, I've recently started working on a task allocation problem using RL. I'd just like to make sure my thinking is correct on how to best approach the problem. At the moment, we have (effectively) a graph traversal sim for n number of agents, where the goal is to minimize the total distance over an episode, as determined by setting the correct tasks. The task supplied to each agent will determine the route that is taken, and therefore the distance. The current idea is to supply an input graph that also contains information on the current location of the agents. A second input would be the set of available tasks. The expected output would be done through a pointer network, where we produce a reordered set of the tasks in descending order of optimality. When step is called, the sim runs until a new task is needed (agent completes it's route). ​ In general, does anyone know a good way to represent the inputs and output of this problem? A pointer network seems like it could work to produce actions, but if I need to do a forward pass for every agent, it seems that there would be no consideration of other agents when determining tasks (We shouldn't have 2 agents doing the same task). For the graph representation, a graph nn seems like an obvious choice, but I just wanted to see if anyone had any insight on why they may or may not be used. submitted by /u/asdfsflhasdfa [link] [comments]  ( 1 min )
    Universities working on reinforcement learning for robotics.
    Can you name any good universities (with high acceptance rate) which are working on reinforcement learning for robotics and also accept students from other branches (i.e. Electrical, Mechanical Engineering). submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Is the game of chess a finite MDP?
    In the standard intro to RL book, I have read that any MDP that has finite actions and states is a finite MDP. But that limit is subjective. So there are approximately 1045. If I limit myself to 105 states, can I say that chess isn't a finite MDP? submitted by /u/BraveProfessional656 [link] [comments]  ( 1 min )
    Reinforcement learning over traditional machine learning method in Finance/Banking ?
    I am currently studying use cases of RL in finance/banking/insurance and I am keen to understand what are its advantages and disadvantages than traditional methods. submitted by /u/kachua26 [link] [comments]  ( 1 min )
  • Open

    FormNet: Beyond Sequential Modeling for Form-Based Document Understanding
    Posted by Chen-Yu Lee and Chun-Liang Li, Research Scientists, Google Research, Cloud AI Team Form-based document understanding is a growing research topic because of its practical potential for automatically converting unstructured text data into structured information to gain insight about a document’s contents. Recent sequence modeling, which is a self-attention mechanism that directly models relationships between all words in a selection of text, has demonstrated state-of-the-art performance on natural language tasks. A natural approach to handle form document understanding tasks is to first serialize the form documents (usually in a left-to-right, top-to-bottom fashion) and then apply state-of-the-art sequence models to them. However, form documents often have more complex layouts …  ( 8 min )
  • Open

    [D] Who are using physics informed neural networks (PINN) in the industry?
    I stumbled upon this JD from Hitachi Energy, which mentions PINN in the section of preferred background: https://www.linkedin.com/jobs/view/2923292435/ Is PINN gaining more attention? And are there more players? submitted by /u/Kohomologia [link] [comments]  ( 1 min )
    [D] Is quantum AI a real thing? (from the software perspective)
    Hi all I'm keeping an eye on state of the art in quantum hardware, but what about software? I can think of many questions and maybe some of you are in the field. What should be the impact of quantum on ML/DL, realistically? What might be a roadmap for the software? And would quantum simulators do already have some benefits on AI? What are the best projects out there? I've seen many but haven't been very convinced submitted by /u/IntelligentHat1657 [link] [comments]  ( 2 min )
    [D] A more fair AI freelancer marketplace that cares freelancers' career advance and benefits
    Hi, ML freelancers. I'm starting a freelancing marketplace, tailored only for AI talents, and I especially care about the welfares of freelancers, and plan to add these: (1) you will be more treated as the employees of the platform, thus we provide training(for all), potentially health care plan(for people have stably worked >20 hours a week), and career advance plan, mentors from experienced freelancers where you get to learn (2) open discussion between employers and you so that you can scope the project better, set a reasonable rate, and timeline (3) we potentially provide MLOPs tool to improve your productivity. (4) we avoid global competition by matching business only with local region-freelancer or areas that are more expensive. How attractive do you think this will be? And any of these benefits already been provided by upwork, freelancer, toptal, fierr? submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [D] Diffusion models video tutorial
    Diffusion models have been behind a recent string of impressive generative results, including OpenAI's DALL-E 2. They’re powered by a simple yet expressive core mechanism. New video covering how they work: https://youtu.be/fbLgFrlTnGU submitted by /u/ariseff [link] [comments]
    [D] Building the Model Behind DoorDash’s Expansive Merchant Selection
    Interested in how DoorDash maintains a well performing and diverse selection in the numerous markets they operate in despite entering the delivery market relatively late ? I had the opportunity to collaborate in this project which involved building a number of models that measured customer preferences, identified market cuisine categories, and predicted merchants' performance on the platform. I wanted to share the approach and some of the technical details with the ML community to get feedback on what we can improve and to show this cool use case to others working on similar sales enablement based models. Check out the blog post I wrote and let me know what you think of our approach. Building the Model Behind DoorDash’s Expansive Merchant Selection submitted by /u/EfficientString7431 [link] [comments]  ( 1 min )
    [D] What's hot in deep learning research at the moment ?
    I took a break from deep learning( starting from last October) , now i want to get back, start with a new project and read papers . Where should i focus ? Should i keep working on vision transformers or maybe start something on geometric deep learning . What's hot and what's going on ? submitted by /u/ovotheking [link] [comments]  ( 1 min )
    [P] A simple PyTorch YOLOv1 training pipeline GitHub Repo
    https://github.com/sovit-123/yolov1_pytorch_voc07 ​ Also, I write about Deep Learning and Machine Learning on https://debuggercafe.com/ Please check it out and let me know if somebody wants any blog posts on a specific topic. submitted by /u/sovit-123 [link] [comments]
    [P] Programmatic: Powerful Weak Labeling
    Hi all!, Really excited to share a project we've been working on and get your feedback! We've made: Programmatic — an NLP annotation tool for building large labeled datasets for NLP without manual annotation Programmatic is like a REPL for data annotation. You: 1. Write simple rules/functions that can approximately label the data 2. Get near-instant feedback across your entire corpus 3. Iterate and improve your rules Finally, it uses a Bayesian label model [1] to convert these noisy annotations into a single, large, clean dataset, which you can then use for training machine learning models. You can programmatically label millions of datapoints in the time taken to hand-label hundreds. What we do differently from weak supervision packages like Snorkel/skweak[1] is to focus on UI to …  ( 2 min )
    [D] What's your opinion on project promoting posts in this sub? Your vote matters.
    There are many projects promoting in this sub, you may like or dislike. And if any of my posts you dislike, allow me to apologize first. However, it gets me to think. Several years ago I'm a moderator in a quite large forum, because I don't have enough time to fulfill my responsibilities, then I decided to retire (yes, they can, and I remained as the vip user which only retired moderators can be). This is a large community, a machine learning community. Besides continuously removing some of these posts, and no clear rules on it, can we do any better? We got all the data, and we just cannot train the model? Here are my three proposal, and please give some excellent ideas besides my poor ones: Self promoting post should have values other than itself, and not having annoying contents Self promoting project can be used as a tool in a non self promoting posts, as long as the posts creates valuable contents and the promoting is not obvious and annoying. Depends on the number of new project posts, Weekly/Daily project post can be created by moderator and pinned to the top. All the promoting content goes into the comment. We can explore and upvotes. Here are some illustrations: 1. Direct Promoting Post ​ 2. Indirect Promoting Post ​ Weekly/Daily Promoting Post by Moderator, Pinned to Top, Comments by project owner, upvotes/downvotes by us Which do you think is acceptable? Or you have better ideas? Leave a comment. It's a machine learning sub, don't make machine to solve it better than us. View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 3 min )
    [R] Differentiable signal processing for optical communication with Google JAX
    Hey folks, I wrote a mini project based on JAX for optical communications signal processing. https://github.com/remifan/commplax I have a research article as a use case demo, https://remifan.github.io/gdbp_study/article.html This tool essentially implements adaptive DSP equalizers as stateful NN layers (thanks to Jax's explicit stateful syntax) implements compositor interfaces from scratch to wrap up those stateful layers with other regular NN layers so that they can be trained together Homebrew serial compositions of stateful layers It is a fun project for me and I feel JAX really elegantly fits this research use. What do you think about JAX? I appreciate your comments:) submitted by /u/StreetPrice1909 [link] [comments]  ( 1 min )
    [D] Tracking the hardware usage while running CV NN Model on a 1000 Images
    Hi guys, I've been working on a machine learning project and I wanted to see how hardware resources are being used when I run inference on let's say 1000 images. How could i calculate the CPU(running inference on CPU)/RAM workload in that timeframe? I'm running it on a Linux Ubuntu VM. Thanks in advance! submitted by /u/Fifi0912 [link] [comments]  ( 1 min )
    [D] Running interactive Python notebooks on HuggingFace Spaces
    I'm working on a framework Mercury for converting Python notebooks into interactive web apps. It can add widgets to the notebook based on the YAML configuration. End-user can tweak widgets values and execute the notebook. The resulting notebook can be downloaded as single-file HTML. Simple. The framework is built on Django+React. It is easy to deploy to Heroku or other cloud services. Recently, I made it possible to deploy it to Hugging Face Spaces (faster and larger machines than on free tier Heroku). The process of deployment is simple. You need to create a Gradio app on Spaces (my framework is not supported, yet ;) ). You need to add the app.py file that will run the Mercury server and upload the notebook. You can check the details in the docs. The HF Space with example notebook https://huggingface.co/spaces/pplonski/deploy-mercury submitted by /u/pp314159 [link] [comments]  ( 1 min )
    [D] Conditional GAN with multiple adversarial losses - Implementation?
    I would like to test the architecture from the following paper with a different dataset: https://www.mdpi.com/2072-4292/13/19/3834 The authors state that their objective function is the following: https://preview.redd.it/u78f27jb6nu81.png?width=1027&format=png&auto=webp&s=32790a67ec829a1e79b252edd0714b8b3b5a7f4e Where: -x is the real grayscale image. -s is its downsampled version, which should be used both as the initial imput of the generator performing the super-resolution and as a first conditional variable in the learning process. -e is another two-dimensional array containing values for a second additional conditional variable. The authors, however, state that this should be implemented by using two separate conditional adversarial losses, one for each of the conditional variables. To clarify, the first adversarial loss should be: AdvLoss1(ParametersG, ParametersD) = - Log(Discriminator(x,s) - Log(1-Discriminator(Generator(s),s) While the second would be: AdvLoss2(ParametersG, ParametersD) = - Log(Discriminator(x,e) - Log(1-Discriminator(Generator(s),e) Which should be then summed up for the backward pass. In my pytorch implementation, however, I have only been able to set up a unique adversarial loss, which could be defined as: CurrentAdvLoss(ParametersG, ParametersD) = - Log(Discriminator(x,(s,e)) - Log(1-Discriminator(Generator(s),(s,e)) I have tried to implement implemented as follows:(simplified version) which I calculate in the following training loop (simplified version, from the same question asked in the Pytorch forum) as errD and errG after conditioning the network on both s and e at the same time: https://discuss.pytorch.org/t/conditional-gan-with-multiple-adversarial-losses/149627 My question is, is there a way to modify the following loop to obtain outputs that have been separately conditioned only first on s and then on e and thus calculate the two separate adversarial losses originally proposed by the authors instead? submitted by /u/Franken91 [link] [comments]  ( 2 min )
    [D] IJCAI 2022 Paper Notification
    This is the discussion for accepted/rejected papers in IJCAI 2022. Results are supposed to release today. submitted by /u/errohan400 [link] [comments]  ( 1 min )
    [R] Authors Claim to Have "Solved" MNIST and CIFAR
    Paper: https://arxiv.org/abs/2204.07953v1 Code: https://github.com/decurtoydiaz/learning_with_signatures Tangential resources of interest: https://arxiv.org/abs/1905.08494, https://en.wikipedia.org/wiki/Rough_path#Signature, and https://labelerrors.com/ Personally, I believe from their code on Github, they have a possible data leakage (in the same vein of the current issue raised there) as well as an accuracy of 100% on a test set is fishier than a fish market. However, I am very curious to hear from the court of public opinion. How is everyone feeling about this? submitted by /u/blingblingbeepbeep [link] [comments]  ( 4 min )
    [D] How do I evaluate if my data represent the target variable before training a machine learning algorithm?
    I have a dataset of points cloud where each point in the point cloud has a variable. I am trying to relate the local geometry features to that point variable by using FPFH, This means I am generating my own features from the dataset by first using an area of n-points to compute normal-vector estimations and from x normal vector estimations to compute the FPFH. However, the numbers x and n are arbitrary and other combinations might describe the target variable better. So I wanted to know if there was a method to evaluate how good a given x and n value are at describing the target variable. I considered the correlation between the features (n,x) and the target variable but I read that this assumes linear combination redundancy. I am using scikit-learn. So basically I have features X(x,n) and a target variable Y. Which x and n, in the feature space X(x,n), describes the target variable, Y, best. I want to do it before the training because when I try to train it with my random forest regressor it takes 3-4 hours and I want to test for more combinations. submitted by /u/Neo-Rushdian [link] [comments]  ( 1 min )
    [Discussion] Training performance evaluation of MindSpore, a home-grown deep learning framework -- by ADSL Lab, CSU
    The article is reproduced from Zhihu, using deepl machine translation, for all enthusiasts to communicate Abstract Deep learning frameworks are the engines and motors for pushing the boundaries of artificial intelligence applications, and good deep learning frameworks can dramatically shorten the cycle of algorithm innovation and validation. In this report, we focus on the newly launched MindSpore framework, which has received a lot of industry attention, and systematically explore its model training speed on GPU clusters and compare it with popular international frameworks. In the evaluation experiments, we choose two classical models, ResNet and BERT-base, to test and analyze their performance with the same algorithm, the same dataset, and the same or similar performance hardware platf…  ( 7 min )
    [D] Why is the diffution model so powerful? but the math behind it is so simple.
    You can see the 200 lines code here: https://nn.labml.ai/diffusion/ddpm/index.html and https://github.com/cloneofsimo/minDiffusion, math is here: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ The algo is smart and simple, but it's generation result seems more incredible than GANs, and its speed is fast, the model size is not too big: https://openai.com/dall-e-2/ , https://huggingface.co/spaces/multimodalart/latentdiffusion, https://www.reddit.com/r/dalle2 So 1st question: why is diffusion model so powerful? Can someone explain it? 2st question: Has anyone used diffusion for NLP? ​ UPDATED: ​ \"A multiverse portal to a new world opening up above Tokyo\" by dalle2 (from r/dalle2) \"A robot painting on a canvas while playing the piano\" by dalle2 (from r/dalle2) ​ \"Mona Lisa in her studio painting Leonardo da Vinci \" by dalle2 (from r/dalle2) \"Science fiction illustration future city in the night | impressionism\" by latentdiffusion ​ \"Science fiction illustration of Beauty and monsters | impressionism\" by latentdiffusion ​ \"a painting of a girl with a fox sitting in a field at sunrise in the style of Claude Monet\" by latentdiffusion submitted by /u/ghosthamlet [link] [comments]  ( 4 min )
    [D] Questions about Intel 12th gen Alder Lake CPUs
    I am looking to build a new PC but have struggled to find the info on how Intel's latest CPUs perform for data science/ML, so if anyone is using one for that purpose and can help with one or more of these questions it would be very helpful! Apologies if these questions should be directed elsewhere. I am planning to use WSL2/Ubuntu but have heard that Intel's thread director isn't implemented well yet in Linux (or Windows 10!), so it doesn't assign tasks properly. Has anyone experienced issues with this firsthand? Assuming the thread director is working, are the e-cores utilised at all in any typical DS workflows? E.g. will they get used with joblib or when training scikit-learn/gbms in parallel? Are the e-cores good enough to handle other stuff like web browsing etc whilst the p-cores are maxed out on model training, or is it still necessary to keep at least one p-core free to avoid crashing the PC? Also I have read that in Windows 11 (where the thread director works best) that the active window/tab could be assigned p-cores as a priority, which isn't very helpful for someone who needs to train models in the background etc, but not sure whether this is actually happening in practice. The consensus from benchmarks/reviews is that the hybrid architecture 'just works' and is superior to AMD right now, but those benchmarks are primarily for use in gaming/video editing. submitted by /u/FightingLikeBeavers [link] [comments]  ( 1 min )
    [N] The new Machine Learning Specialization by DeepLearning.AI and Stanford Online is launching soon! Join the Waitlist.
    We’re thrilled to announce a brand new Machine Learning Specialization, in collaboration with DeepLearning.AI, launching in June on Coursera! Learn essential real-world skills from AI pioneer Andrew Ng, who co-founded Google Brain and Coursera, led AI research at Baidu, and has impacted millions of AI learners. This updated 3-course Specialization will cover the latest machine learning techniques as well as foundational AI concepts that made its predecessor one of the world’s most popular machine learning courses. Join the waitlist! https://preview.redd.it/yujr31t6vku81.png?width=5000&format=png&auto=webp&s=0f4c4ef090bcdc7cfb04ee2c817d766f23c236a6 submitted by /u/Stanford_Online [link] [comments]  ( 1 min )
  • Open

    How Meta's multiverse could prove our universe is a fake
    submitted by /u/estasfuera [link] [comments]
    SingularAgent - Many Methods Make Light Work
    submitted by /u/dantheman333 [link] [comments]
    General AI In Healthcare | Machine Learning For Cardiovascular Disease | Color Night Vision
    submitted by /u/getrich_or_diemining [link] [comments]
    A realistic image AI software
    submitted by /u/Eurokiwiboy [link] [comments]
    Ant colony simulation
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    Is there any free open source AI model available for answering any bible related queries?
    A few years back I developed a very simple app just to show a few bible verses. Though it is a very simple app, it got more than 50K installs without much promotion. So, I am thinking about promoting it. But hesitate to do it as it is very simple app. So, I would like to add some useful feature before start promoting it. I would like to add a feature which will allow the users to ask any question related to bible, and it should be giving relevant answer. I assume that some bible data is open source. Is there any free tutorial available to know about how to implement AI based chat system for answering any bible related queries after training with bible data. Is there any app already providing this feature? submitted by /u/qptbook [link] [comments]  ( 1 min )
    Today, AI is becoming ubiquitous, in and out of the workplace. With artificial intelligence (AI) becoming more powerful, the questions that surround AI ethics are becoming more relevant.
    But can technology be controlled to avoid adverse outcomes? Let's understand how AI will help us to make a better world. https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]  ( 1 min )
    Top Ethical Challenges in AI – The Price of Progress
    submitted by /u/JencyJane [link] [comments]
    Artificial Nightmares: Dr. Strange || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Weekly China AI News: Chinese Prominent AI Lab Plagiarizes Big Model Paper; Microsoft Research Asia Halts Internship Hiring from US-Banned Universities; Beijing Announces New RISC-V Chip Institute
    submitted by /u/trcytony [link] [comments]  ( 1 min )
  • Open

    French palindromes and Morse code
    I got an email from a student in France who asked about a French counterpart to my post on Morse code palindromes, and this post is a response to that email. Palindromes A palindrome is a word that remains the same when the letters are reversed, like kayak. A Morse code palindrome is a word […] French palindromes and Morse code first appeared on John D. Cook.  ( 2 min )
    Blaschke factors
    Blaschke factors are complex functions with specified zeros inside the unit disk. Given a complex number a with |a| < 1, the Blaschke factor associated with a is the function Notice the semicolon in b(z; a). This is a convention that a few authors follow, and that I wish more would adopt. From a purely […] Blaschke factors first appeared on John D. Cook.  ( 2 min )
  • Open

    AI Application Development Guide for Business Owners
    To start deeply investigating the AI app development process, it’s important to first understand how these projects differ from regular app…  ( 9 min )
  • Open

    Neural Network gets too large and dies
    Hi, I've been working on a project for my computer science class and everything has been working up until the training. I'm following a guide online that has worked fairly well. Whenever I try to train, however, I run into an overflow error and the entire network dies. I'm not sure where to go from here as I've tried a few steps to fix the issue, if anyone could offer up some advice to fixing my problem that would be amazing. submitted by /u/djm710 [link] [comments]  ( 2 min )
    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated) -
    submitted by /u/maneesh123456 [link] [comments]
    Question about Sigmoid and Heaviside
    I read a paper and was a little bit confused: In the paper it said: "Imagine you have a two dimensional (binary input) classification (0 or 1) problem and you use Sigmoid as an acitivation function. Since the Sigmoid gives you a real number between 0 and 1, it's not really classification anymore. Therefore you take the input of Sigmoid (y_Sigmoid) and put this into a modified heaviside function H(y-0.5) (so for y_Sigmoid bigger than 0.5, it gives you yHeavi = 1) The decision boundary is given by a straight line w1a1+w2a2+w0=0 and this whole process, it only works with the Sigmoid function as first activation function." The last paragraph confused me. Why can I assume that the decision boundary is exactly that (It's just the "normal decision boundary" for a SLP, why does it work here a also) and why does it work with only Sigmoid Function as first activation function submitted by /u/LawlHeyman [link] [comments]  ( 1 min )
    Starting a neural network
    I want to create a program that can take music i feed it and over time create its own music based on the inputs. I know i have to use a neural network and deep learning algorithms but how do i get started. Thanks. submitted by /u/Saxy-Snark [link] [comments]  ( 2 min )
  • Open

    A First Course on Deploying Python Projects
    After all the hard work on developing a project in Python, we want to share our project with other people. It can be your friend or your colleagues. Maybe they do not interested in your code, but they want to run it and make some real use of it. An example is you created a […] The post A First Course on Deploying Python Projects appeared first on Machine Learning Mastery.  ( 10 min )
  • Open

    Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
    A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning. However, many recent algorithms reframe RL as a supervised learning problem. These algorithms learn conditional policies by conditioning on goal states (Lynch et al., 2019; Ghosh et al., 2021), reward-to-go (Kumar et al., 2019; Chen et al., 2021), or language descriptions of the task (Lynch and Sermanet, 2021). We find the simplicity of these methods quite appealing. If supervised learning is enough to solve RL problems, then offline RL could become widely accessible and (relatively) easy to implemen…  ( 5 min )
  • Open

    DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs
    I just moved. I’d like to say that I was highly organized, that I knew where every box ended up and what was in each box. I would be lying. Most people who move know the feeling of living in boxes even after the movers have left, the days spent dodging labyrinths of teetering cardboard,… Read More »DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs The post DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs appeared first on Data Science Central.  ( 4 min )
    How Microsoft Power BI Revolutionizes Business
    As cloud-based business intelligence becomes more and more popular in the market, one name has made quite a mark: Power BI. A Microsoft offering, Power BI is an interactive data visualization and analytics tool that promises to revolutionize business. Here are some of its key benefits to help you see how it can do that:… Read More »How Microsoft Power BI Revolutionizes Business The post How Microsoft Power BI Revolutionizes Business appeared first on Data Science Central.  ( 3 min )
    How AI and ML are transforming data quality management?
    Introduction In recent years technology has become prominent, both at work and at home. Machine learning (ML) and Artificial Intelligence (AI) are evolving quickly today. Almost everyone will have some interaction with a form of AI daily. Some common examples include Siri, Google Maps, Netflix, and Social media (Facebook/Snapchat).AI and ML have popularly used buzzwords… Read More »How AI and ML are transforming data quality management? The post How AI and ML are transforming data quality management? appeared first on Data Science Central.  ( 4 min )
    Agile, Agile 2 and Agility, Part II
    In the previous article in this series, we discussed the difference between Agile and business agility and how Agile 2 addresses some of the omissions and failings of traditional Agile.  Both Agile and Agile 2 focus on accelerating digital development; however, the benefits of any Agile approach can be obviated if it is not implemented… Read More »Agile, Agile 2 and Agility, Part II The post Agile, Agile 2 and Agility, Part II appeared first on Data Science Central.  ( 4 min )
  • Open

    Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra
    Organizations use collaborative document authoring solutions like Salesforce Quip to embed real-time, collaborative documents inside Salesforce records. Quip is Salesforce’s productivity platform that transforms the way enterprises work together, delivering modern collaboration securely and simply across any device. A Quip repository captures invaluable organizational knowledge in the form of collaborative documents and workflows. However, finding […]  ( 6 min )

  • Open

    Do regulatory data projects really need design-time data lineage? Probably not.
    Your regulatory data project likely has no use case for design-time data lineage. tl/dr Mapping Data Lineage at design time, for its own end, has no regulatory use case or ROI.  Buying a specialist tool to support that mapping has even less ROI.  Regulations see that kind of documentary data lineage as ancillary at best.… Read More »Do regulatory data projects really need design-time data lineage? Probably not. The post Do regulatory data projects really need design-time data lineage? Probably not. appeared first on Data Science Central.  ( 10 min )
    Dark Energy, Dark Data
    During the 1990s, the physics community began to measure the brightness of certain supernovae in a novel way. This new method supported the conclusion Edwin Hubble had first arrived at in 1929 after discovering that galaxies are becoming more and more distant from us: Dark matter and dark energy play a role in why those… Read More »Dark Energy, Dark Data The post Dark Energy, Dark Data appeared first on Data Science Central.  ( 4 min )
    5 Main Benefits of Distributed Cloud Computing
    According to the predictions of Garter, by 2024, distributed cloud computing opportunities will be offered by most cloud vendors on a service basis. With the increasing rush in the cloud space and digitalization of documentation, this industry is bound to grow. Understanding Distributed Cloud Distributed cloud is an innovation to traditional cloud computing. It means… Read More »5 Main Benefits of Distributed Cloud Computing The post 5 Main Benefits of Distributed Cloud Computing appeared first on Data Science Central.  ( 5 min )
  • Open

    Speeding Up AI Algorithms- Inferencing challenges at the edge
    submitted by /u/Chipdoc [link] [comments]
    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    What if You Are a Prototype for the Ultimate Sentient Artificial Intelligence?
    submitted by /u/IndependenceFun4627 [link] [comments]  ( 1 min )
    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    There are so many crappy chatbots, cause people don't pay attention on how it's performing. If you're one of them, here are metrics to keep in mind
    Hi there! Chatbots are not the "set and forget" thing like many other software. If you want to achieve great results with your chatbot, you need to improve it constantly. To know where and what to improve, you need to track and monitor chatbot analytics and the main chatbot metrics. General chatbot metrics Total number of users User satisfaction Accuracy of the chatbot Engagement metrics Active users New users Conversation Length Retention Rate Bounce Rate Flow Completion Rate Conversational analytics Goal Completion Rate (GCR) Fallback Rate Human Takeover Rate * Bonus: Revenue metrics Revenue generated ROI / payback period Here in the article we covered how to calculate each metrics, and you can find needed metrics depending on the industry you working in https://botscrew.com/blog/chatbot-metrics/?utm_source=RedditArtificial&utm_medium=&utm_campaign=&utm_term=&utm_content= submitted by /u/Avandegraund [link] [comments]  ( 1 min )
    Wake-up Call for Science – AI System Develops 40,000 Chemical Weapons in 6 Hours
    submitted by /u/TheCnt23 [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]  ( 2 min )
    which courses are good for complete beginners?
    Hello everyone , can someone recommend me for some good courses to do , I saw some courses on udemy , this one worth it? https://www.udemy.com/course/artificial-intelligence-az/ or I can learn everything on youtube? there are few more on udemy but I don't know how good they are .. is it worth buying one of those or there are better videos on youtube? EDIT : I found another 4 courses : https://www.udemy.com/course/100-days-of-code/ https://www.udemy.com/course/complete-python-bootcamp/ https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/ https://www.udemy.com/course/machinelearning/ Which one of them would you recommend the most? submitted by /u/Edrixor [link] [comments]  ( 1 min )
    AI will make us dumb: [2204.07888] AI, Ageing and Brain-Work Productivity: Technological Change in Professional Japanese Chess
    submitted by /u/kg4jxt [link] [comments]  ( 1 min )
    Any good resources to learn Default Theory?
    I am having a difficult time understanding Default Theory and the various methods e.g Makinson to find the extension of default theories submitted by /u/cocag13996 [link] [comments]
    I know that the voice in this video is made using Replica Studio's engine, but does anyone know which voice exactly was used?
    This I looked through the available ones, not a single one seems to match it. Sorry if this isn't the right sub to ask, but since Replica Studios doesn't have its own sub I don't know where submitted by /u/AxySmarts [link] [comments]  ( 1 min )
    Artificial Nightmares: Schizophrenia || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    [D] Resources for Images Anomaly Detection
    Hello all, I know that there is a lot going on this field. I would like to get started on it, study more.. And as always, I like to start from the basis. Do you have any resource (video, article, book) good to star with? I know there are Autoencoders and Statistical models.. But how to know more, where/how do you keep studying? submitted by /u/bollolo [link] [comments]  ( 1 min )
    [R] Where can I find case studies on different ML projects?
    I am working on my research paper and would like to find resources which show the case studies of ML projects from the beginning to the end, doesn't matter if it failed or succeeded. submitted by /u/mkonu [link] [comments]
    [D] Create Labels for Data created by a GAN
    Hello there! I hope you have a great day! Currently I want to compare how good multiple GANs (Vanilla GAN, WGAN, DCGAN, ...) are for a given use case. Therefore I trained the various GAN versions with data of two different classes (i.e. apple and banana). Now I want to show that data I generate with the Generator can be used to train i.e. a classifier that can distinguish between real images of apples and bananas. Can I somehow create labels for the data I generate with the Generator in a smart way? So that I know that a generated image of the generator should for example be an apple? How do i do that? submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P] Luminide: new optimization Early Ranking achieves higher accuracy AI models
    Luminide introduces a new optimization, called Early Ranking, which makes it easier to build better AI models. Early Ranking achieves the same AI training results with up to 10x less compute – this saves time, reduces costs, and increases model accuracy. Luminide's IDE is a customized version of JupyterLab with integrated AI dev tools. Luminide used Early Ranking to place Top 1% in the CVPR Plant Pathology Kaggle competition. You can read about how we developed our winning model, and you can too, in our new blog post: Better Automation for Higher Accuracy AI Models. Class activation maps give insights into Luminide's winning model. Luminide is a new cloud platform for AI model development. Check out our demo video for a quick overview, or try it for yourself (sign up today and receive 100 hours of free GPU cloud compute). submitted by /u/LuminideInc [link] [comments]  ( 1 min )
    [D] generic discussion on freelance ML engineers
    Hi, reddit. Recently, I'm looking into freelancer career path. Currently, I'm a researcher at a top company. So far, I know there is toptal, upwork, and freelancers. Checked them out, and seems toptal you still end up working for large corporate and mostly end up as full-time contractor which is not really a different or better option than my current work. Freelancers has too many bidders from developing countries. Besides what platform to use, i have more questions in terms of what obstacles we are facing to be freelancer ML engineer? Even though i am in AI and a researcher, but i have never deployed a model in production. Usually a task at big company takes a team or multiple teams to complete the MLOPs lifecycle, how can you do it as a single person? Any sharing of experience would be of great help. submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [R] Looking for AI/ML experts from Southeast Asia to interview for master thesis
    Hello everyone, I am a student from Germany writing my master thesis on Digital Transformation in ASEAN with AI/ML. For my thesis I would like to interview AI/ML experts from the ASEAN region to talk about the digital development of each country, challenges and potentials. (If you are not native there, but you have a work connection or just knowledge about the region and its AI development, I appreciate that as well.) It would be awesome if some of you were open to talk to me. A few sentences are enough, I won't take much of your time. If you want, we can do a video call as well. I will quote you of course. Thank you guys. submitted by /u/BlueLagoon357 [link] [comments]  ( 1 min )
    Dealing with numerically 0 likelihood in probabilistic models [R]
    I'm trying to find literature on solving the following issue: In most probabilistic ML models, we model the joint distribution over a set of random variables, p(x1, ..., xN). If N is very large (e.g. 100, 500, or even 1000), then regardless of how you model this, the distribution's highest point of density is still quite tiny. E.g. if you consider an isotropic multivariate gaussian of 100 dimensions, the highest point of density will be somewhere in the neighbourhood of 1.6e-40. So when it comes time to evaluate log likelihood for a model like this, the probability is numerically 0, so the log probability goes to negative infinity. ​ Is there work around solving these kinds of issues? I.e. by constraining the model in some way, or scaling model output, etc? I've done some googling, but am having a hard time finding papers on the subject. Not even sure what to call the problem... Curse of dimensionality in PGMs? ​ Any recommendations of papers / talks / etc is greatly appreciated! submitted by /u/CS_Student95 [link] [comments]  ( 1 min )
    [R][P] GAN-Control: Explicitly Controllable GANs + Gradio Web Demo
    ​ https://i.redd.it/v61jw1fekiu81.gif Abstract: We present a framework for training GANs with explicit control over generated facial images. We are able to control the generated image by settings exact attributes such as age, pose, expression, etc. Most approaches for manipulating GAN-generated images achieve partial control by leveraging the latent space disentanglement properties, obtained implicitly after standard GAN training. Such methods are able to change the relative intensity of certain attributes, but not explicitly set their values. Recently proposed methods, designed for explicit control over human faces, harness morphable 3D face models (3DMM) to allow fine-grained control capabilities in GANs. Unlike these methods, our control is not constrained to 3DMM parameters and is extendable beyond the domain of human faces. Using contrastive learning, we obtain GANs with an explicitly disentangled latent space. This disentanglement is utilized to train control-encoders mapping human-interpretable inputs to suitable latent vectors, thus allowing explicit control. In the domain of human faces we demonstrate control over identity, age, pose, expression, hair color and illumination. We also demonstrate control capabilities of our framework in the domains of painted portraits and dog image generation. We demonstrate that our approach achieves state-of-the-art performance both qualitatively and quantitatively. submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Who funds the leading conferences in the field?
    I know that the publishers of the leading journals are mostly for-profit organization, that is weird because as researchers in the field we really “volunteer” for a free peer review or even pay to publish papers and read papers. On the other hand, i wasnt able to find information about the funding and profit goals of the leading conferences. Take NeuroIPS for example, i found that it is organized by “NeurIPS Foundation” but what exactly is this foundation - i couldn’t find any information about this subject. My point is, if the conferences are non-profit, sounds like they should be preferred over funding a for-profit organizations. submitted by /u/Careful_Winner_2335 [link] [comments]  ( 1 min )
    [D] NLP has HuggingFace, what does Computer Vision have?
    I've been writing tutorials with Pinferencia and HuggingFace. HuggingFace is quite handy and easy to use. I want to write some tutorial about computer vision afterwards. Is there anything similar in Computer vision area? submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 2 min )
    [D] Why no paper in Speech Emotion Recognition train on multiple datasets ?
    I took a look at multiple of them and I was curious why they seemed to benchmark on multiple datasets but for the training, they restrained themselves to only 1 for training instead of merging them. From that they get good scores on the one they trained on, but bad ones for the rest. submitted by /u/raysamram [link] [comments]  ( 1 min )
    [P] SparseServer.UI : A UI to test performance of Sparse Transformers
    You can now load multiple transformers (each model has a unique sparsification recipe) on top of the DeepSparse server behind Streamlit, and it's open-source. This was battle tested on a 16GB of RAM with only 4 core CPU virtual machine. These compute requirements are enough to load up to 19 sparse BERT models in memory and compare their performance on question answering (P.S. they are really fast on just CPUs). 💻code: https://github.com/neuralmagic/deepsparse/tree/main/examples/sparseserver-ui submitted by /u/Quantum_Stat [link] [comments]  ( 1 min )
    [Research] Learning with Signatures
    This paper reports "results on AFHQ dataset, Four Shapes, MNIST and CIFAR10 achieving 100% accuracy on all tasks." The authors used few-shot classification "by comparing each test sample (after optional augmentation and computation of the element-wise mean) against a representative element-wise mean signature computed by averaging the signatures of a given number of train samples." What are your thoughts on this? Learning with Signatures - https://arxiv.org/abs/2204.07953 submitted by /u/Marmadelov [link] [comments]  ( 1 min )
    [Project] [Research] Simple Speech Recognition System
    Github - Bangla Spoken Number Recognition Dataset - Our custom dataset on Bangla Numerals Publications - Though its on (0-9) digits We have created a simple speech recognition system for recognizing Bangla numerals from '০-৯৯'(0-99). In this project, audio samples from different genders, age groups, and dialects of Bangladeshi people were used to create a speech dataset of spoken numbers from '০-৯৯'(0-99). The raw speech data is subjected to various audio augmentation techniques such as time shift, speed tuning, background noise mixing, and volume tuning. Then, to extract meaningful features from the data, Mel Frequency Cepstrum Coefficients (MFCCs) are used. We have used, Convolutional Neural Networks (CNNs), to develop a Bangla number recognition system. The proposed method recognizes '০-৯৯'(0-99) Bangla spoken numbers with 89.61% accuracy across the entire dataset. The model’s effectiveness was also tested using 10-fold cross-validation, with 89.74% accuracy for recognizing '০-৯৯'(0-99) Bangla spoken numbers across the entire dataset. I Hope, this work will help you in some way. :) submitted by /u/PIASR0Y [link] [comments]  ( 1 min )
    [P] Improving mulitclass classification accuracy with Jain's Fairness Index
    This is a light implementation of the idea in the paper Leveraging Uncertainties in Softmax Decision-Making Models for Low-Power IoT Devices. Instead of finding uncertainties I have added Jain's Fairness Index as a addition to the loss function. Gist: https://gist.github.com/Gananath/8d167384da7d3bc078650c73fab1a8dd submitted by /u/gananath [link] [comments]
    [D] Are workshop papers considered "final publications"?
    Specifically, I'm talking about workshops of major conferences (NeurIPS, ICLR, ICML, etc.). If I submit a paper and it gets accepted, is that workshop paper a "final publication"? Or would most people expect the project to continue being developed into a slightly larger/longer paper for submission to the main stream of a conference? And if so, does publishing the earlier workshop paper tend to hinder or harm the later conference submission? I recognise there's a variety of workshops, and perhaps each have different expectations or norms. I'm wondering, from my outsider's perspective, how can I tell? For example, I have been thinking about submitting to one of these ICML workshops: https://icml-compbio.github.io/ or https://www.tagds.com/workshops/tag-in-machine-learning. Is there an easy way to tell whether either or both of these are "final publication" venues or not? submitted by /u/tfburns [link] [comments]  ( 2 min )
    [R] Maximum likelihood estimation can fail due to "Manifold Overfitting"
    arXiv: https://arxiv.org/abs/2204.07172 This paper out today seems to make the bold claim that maximum likelihood estimation is not a well-posed training objective in deep generative modelling. The manifold hypothesis says that observed high-dimensional data clusters around low-dimensional manifolds, but maximum likelihood methods (e.g. VAE, normalizing flows) learn high-dimensional densities. The paper argues that the mismatch between dimensionalities will lead to a problem called "manifold overfitting". Models are able to maximize likelihood in high-dimensions by sending the density to infinity around the low-dimensional manifold, but they can do this while completely ignoring the distribution of data on the manifold. So in other words, high capacity models will learn the data manifold…  ( 5 min )
  • Open

    "Reinforcement Learning with Action-Free Pre-Training from Videos", Seo et al 2022
    submitted by /u/gwern [link] [comments]
    "Inferring Rewards from Language in Context", Lin et al 202
    submitted by /u/gwern [link] [comments]
    Bandit problems as sequential decision problems
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process (need to model the state carefully). An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book (https://castlelab.princeton.edu/RLSO/) for a complete summary of the four classes of policies for pure learning problems (aka bandit problems). Note that Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Getting started with UAV/drone control
    Hi, is it currently possible to train a UAV and implement the policy it in real-life? I understand there are different environments for training, e.g. AirSim, GymFC, and others. However, the interesting part for me is the link to the real world: Is there a way to directly implement any learned policy on a real drone, e.g. a commercially available quad-copter? Which UAV would support such a functionality? I'd love to get started on training drones for RL purposes (search and rescue, etc), but if there is no way to test it in real-life then this would be disappointing. submitted by /u/FrankTheThanks [link] [comments]  ( 1 min )
    Question about Expected Sarsa for prediction vs control
    I am having a hard time figuring out what makes the difference between Expected Sarsa for prediction vs for control. For off-policy Expected Sarsa I believe it's possible to use one epsilon value for a target policy that is epsilon-greedy and another epsilon value for a behaviour policy that is epsilon-greedy. The target policy would be used within the expected value calculation in the update of Q(S,A), the action value function, and the behaviour policy would be used to choose actions from the current state. But I'm not sure how to differentiate between the control version of the algorithm compared to the prediction version though. I think prediction usually finds the state-value function but I know that on-line Sarsa for prediction uses Q(S,A) so I'm not sure how to determine the difference between prediction and control algorithms. submitted by /u/lifelifebalance [link] [comments]  ( 1 min )
    exploration strategies in discrete action spaces
    Hello there, I am working on missile command game, and as a baseline I mostly use rllib/ppo. The algorithm never converges, I suspect it is because of the lack of exploration. Since the timesteps are small, the target usually oscillates around center of the screen, it is impossible to explore to go near the border and then explore to fire (to counter incoming missile). What methods should I try? Moreover, I have already done reward scaling and frame staking. Any suggestions regarding solving this game is much appreciated. Last question, do you now similar (and common) environments that is solved, maybe solutions show the path to follow.Thank you :) submitted by /u/Street_Excitement_14 [link] [comments]  ( 1 min )
    Confusion of hyperparameters in ppo
    I'm reading the ppo paper https://arxiv.org/abs/1707.06347 and I'm confusing about the hyperparameters in table 4, Log stdev. of action distribution | LinearAnneal(-0.7, -1.6). Best to my knowledge, under the continuous setting, the policy will output mean and std, so why the stdev of action distribution is given as a hyperparameter, and also what is LinearAnneal in detail. submitted by /u/StrawberryTemporary7 [link] [comments]  ( 1 min )
    Need help about categorical dqn
    I dk how the projection of TZ to match Z work and I also dont understand the formula? can someone do step by step calculation to demo? submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
  • Open

    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    This is a long shot, but does anyone remember...
    Hi, this is a very long shot. I have been trying to remember the name of a science TV show which aired in the UK back in the 90's. It focussed on Neural Networks and gave some brilliant examples of environmental sensing. There was also a section showing a simple voice synthesiser which "babbled" like a child. I thought it may have been an "Horizon" show, however, I have been through the list of shows from that time and none appear to be right. If anyone has a memory of this show please let me know. One of the visuals I remember was a plastic skull with an LED matrix inside showing patterns. Obviously this was just some smoke and mirrors, however, it may trigger a memory. I'm trying to recall something from best part of 30 years ago.. submitted by /u/_m0xya_ [link] [comments]  ( 1 min )
  • Open

    10 seats remaining | A series of live ML strategy workshops
    Sponsored Post Unlike traditional online courses, Foster Provost’s workshops will give you the chance to engage live with a world-class […] The post 10 seats remaining | A series of live ML strategy workshops appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    Learning to Prompt for Continual Learning
    Posted by Zifeng Wang, Student Researcher, and Zizhao Zhang, Software Engineer, Google Research Supervised learning is a common approach to machine learning (ML) in which the model is trained using data that is labeled appropriately for the task at hand. Ordinary supervised learning trains on independent and identically distributed (IID) data, where all training examples are sampled from a fixed set of classes, and the model has access to these examples throughout the entire training phase. In contrast, continual learning tackles the problem of training a single model on changing data distributions where different classification tasks are presented sequentially. This is particularly important, for example, to enable autonomous agents to process and interpret continuous streams of informati…  ( 7 min )
  • Open

    Integrate ServiceNow with Amazon Lex chatbot for ticket processing
    Conversational interfaces (or chatbots) can provide an intuitive interface for processes such as creating and monitoring tickets. Let’s consider a situation in which a recent hire on your team is required to cut tickets for office equipment. To do so, they have to interact with a ticketing software that the organization uses. This often requires […]  ( 10 min )
  • Open

    Inversion in a circle
    Inversion in the unit circle is a way of turning the circle inside-out. Everything that was inside the circle goes outside the circle, and everything that was outside the circle comes in. Not only is the disk turned inside-out, the same thing happens along each ray going out from the origin. Points on that ray […] Inversion in a circle first appeared on John D. Cook.  ( 2 min )
  • Open

    Don’t let data drift derail edge compute machine learning models
    Edge computing has come of age, with deployments enabling many applications that process data from IoT sensors and cameras. In 2017, we identified the symbiotic relationship between edge computing and video analytics in an article, noting that live video analytics is the “killer app” for edge computing. Edge devices come in various shapes and sizes […] The post Don’t let data drift derail edge compute machine learning models appeared first on Microsoft Research.  ( 5 min )
  • Open

    A Quick Guide To Find The Right Minds For Annotation Is So Famous, But Why?
    Shared duties have always been the most critical component of every successful organization, regardless of its nature or size. When it…  ( 4 min )
  • Open

    Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques
    Creating content is no longer tethered to using paint and stone as mediums, nor being in massive studios. Visual art can now be created anywhere, anytime. But being creative is still challenging and time-consuming. NVIDIA is making artistic workflows easier and faster by giving creators tools that enable them to remain in their flow state. Read article > The post Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques appeared first on NVIDIA Blog.  ( 4 min )

  • Open

    whats your hopes and worry about future humaniod Artificial intelligence coming soon?
    submitted by /u/Upset_Force66 [link] [comments]
    AI Dream 36 - Psychedelic Special (4K 40Mbit Test)
    submitted by /u/LordPewPew777 [link] [comments]
    AI Startups and the Hunt for Tech Talent in Vietnam
    submitted by /u/regalalgorithm [link] [comments]
    We don't have echolocation
    submitted by /u/tezdhar [link] [comments]
    These 3-Michelin-starred plates were invented by AI. The food doesn’t even exist
    submitted by /u/jonfla [link] [comments]
    Getting in shape while homeworking by force locking the screen and using blazepose pose estimation to detect pushups to unlock it again.
    submitted by /u/ThePyCoder [link] [comments]
    Last Week in AI: AI chip startup funding doubled in the last 5 years, new AI applications in hospitals and restaurants, Cruise robotaxi pulled over by police in SF, and more!
    https://lastweekin.ai/p/163?s=w submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Youtubers create a completely AI "influencer."
    submitted by /u/savetheattack [link] [comments]
    FOMO is a TinyML neural network for real-time object detection
    submitted by /u/bendee983 [link] [comments]
    An online course with an AI tutor achieves a significantly higher completion rate than traditional online courses thanks to a personalized learning experience.
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Protein Folding Neural Networks (e.g RoseTTAFold) Are Not Robust
    submitted by /u/qptbook [link] [comments]
    Witch of the Barthe
    submitted by /u/Hacknaut [link] [comments]
    Why is it called tensorflow and not matrixflow?
    Hello, I'm MB. A very nice and polite guy. Why is it called tensorflow and not matrixflow? AI is all about matrix multiplications, right? So why use the word tensor instead? I know what a tensor is, kind of. But isn't AI about matrix multiplications primarily rather than tensor multiplications. ELI5 please. submitted by /u/MountBlanc [link] [comments]  ( 2 min )
    Society
    submitted by /u/booksmoothie [link] [comments]
    Bioinspired multisensory neural network with crossmodal integration and recognition
    submitted by /u/booksmoothie [link] [comments]
    Realistic animal movement
    I am working on a robotic pet that has lots of movement capability but is simply scripted and will unnaturally jump between movement sets without considering the current movement. What branch of AI should I look into leaning about? Currently I use mostly python for high level and C for microcontrollers. submitted by /u/uMinded [link] [comments]
  • Open

    A3C vs federated learning?
    Hi, I see this question was asked before but I am still not convinced there is a difference between the two. How is asynchronous distributed RL (A3C) and federated learning different? It seems like the basic idea behind them is the same— the agents train in their own environments and only share gradients with the server. Is the difference only in terms of the domain they are applied in? Is it just ML vs RL? submitted by /u/uneasy_daisy [link] [comments]  ( 1 min )
    Can polyak averaging neural networks lead to numerical instability?
    In Soft Actor Critic several Q networks are used. Target Q networks are gradually updated to match other Q networks. See step 15 here: https://spinningup.openai.com/en/latest/algorithms/sac.html#pseudocode I've heard this called polyak averaging. Let's say we have two weights from two neural networks: W1 from one network, and W2 is the corresponding weight from the other network. Polyak averaging averages these weights as follows: W_average = W1 * p + W2 * (1-p) When p is 0.5, it's a evenly weighted average. If p is high, then W1 is weighted more heavily than W2, etc. My question is: Does this method of averaging weights lead to numerically unstable neural networks? This technique is often used to gradually transform one neural network into another on a weight by weight basis, but there is no guarantee that all intermediate neural networks are well behaved (at least, none that I'm aware of). Whereas, gradient descent with small enough step sizes should, theoretically, keep a neural network well behaved, I think those same theoretical guarantees apply to polyak averaging neural networks. What do you think? submitted by /u/Buttons840 [link] [comments]  ( 2 min )
  • Open

    [D] Word Meaning Dictionary Dataset
    Hey all! So I intend to make an application that, very naively speaking, outputs synonyms of a given word regardless of context (like if word1 is "bank", the model should output both "money" and "river", and the order does not matter). For this, I intend to use a Doc2Vec type of classifier, where the meanings of each word can serve as a document, and then similar words can easily be returned using a cosine similarity function. I chose this over a classic Word2Vec as this will be able to predict uncommon words (which blimey the English language has a lot of) which would otherwise be processed as tokens. To this end, I am searching for a suitable dataset. Any ideas? submitted by /u/GrammarPaparazzi [link] [comments]  ( 1 min )
    [D] Is there a way to use a series of videos as the predictor variable for prediction/regression?
    This is the problem area I am working with: I have a series of videos taken at different times, and each video is paired with a physical variable. The videos contain information that correlates with the physical variable. What we want to do is use the information encoded within each video to build a correlation model with the physical quantity, and thereafter use new videos to predict the physical quantity. (We want to avoid the route of video -> CNN -> extract parameters -> build model with parameters. Instead, we want to directly go from the videos to the model without separately extracting parameters.) So, in a way, I want to use a series of videos as a time series data set. Is there a way to do this? What should be the starting point for my research into this? Thanks in advance! I am not an expert with this area at all, and would greatly appreciate guidance from the community. submitted by /u/besse [link] [comments]  ( 1 min )
    [P] Blog post + open-source PyTorch implementation of DeepMind's SIMONe (unsupervised scene decomposition)
    Hi all! My team recently reproduced and published a PyTorch implementation of the paper SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition. Our blog post walks through the code and provides a detailed explanation of the architecture they use in order to perform object segmentation on videos in a fully self-supervised manner. Hope this is helpful/interesting to others! submitted by /u/ai_ellie [link] [comments]  ( 1 min )
    [D] AutoRF vs SinNeRF
    Both approaches seem to be able to render complex scenes from a single view, without the need for explicit priors or pretrained feature extractors. Conveniently, AutoRF doesn't mention SinNeRF. What are the similarities and differences among the two approaches? DISCLAIMER - I'm not a NeRF expert. My limited understanding of it is that we train a small MLP to regress the radiance field for a scene, i.e., to predict emitted radiance at a point (x,y,z) in the viewing direction (θ, φ). Once we have the radiance field, we can use some rendering engine to render a 2D view from the 3D field and the camera parameters. EDIT: I just realized that I didn't link the papers, how silly of me. Here they are: SinNeRF: https://arxiv.org/abs/2204.00928 AutoRF: https://arxiv.org/abs/2204.03593 ​ ​ submitted by /u/Best-Neat-9439 [link] [comments]  ( 1 min )
    [P] Evaluating automatic paraphrasing via BLEU, LaBSE, Perplexity and Jaccard similarity index - how we do it for Linguix Paraphraser 2.0
    Hey everyone! Our NLP team, led by our expert Daria, has recently released a new AI-based paraphrasing feature – Linguix Paraphraser 2.0. To measure its quality, we use four important metrics: BLEU, Jaccard similarity index, LaBSE and Perplexity. Performance stats: BLEU, which is used for measuring the quality of machine translation. The lower it is for rephrase task, the better. Right now, Linguix Paraphraser 2.0 has the BLEU metric of 0.47 (previous iteration had 0.65). So, we can say that our paraphraser is now smarter, it uses more words to rewrite the sentence, but the overall idea of the content is still preserved. Jaccard similarity index is used to measure the likeness of x and y objects. The same as with BLEU, the lower the index for the task, the better. Our current metric is 0.45 compared to 0.51 for the previous iteration. LaBSE metric is used to measure the semantic similarity of two sentences. It translates text into vectors so that vectors of texts close in meaning are geometrically close to each other. The higher the metric, the better. The new model has LaBSE similarity slightly less than the previous model: 0.80 vs 0.93, which is normal and correct, because the model generates a variety of variants using other words, but keeping the meaning of the source text in the target. Perplexity is used to ensure the rewritten content sounds natural (lower perplexity is better). The naturalness of the rewrites generated by our new paraphraser is much better than before: 0.26 vs 4.99 for the prior version. ​ https://i.redd.it/iaaf7o2iibu81.gif As such, for Linguix Paraphraser 2.0 we were able to improve the quality of the rephrased content, while keeping the text meaning at the same level. P.S. Daria is somewhat shy, so I asked her to share the update here on her behalf. Anyway she'll be pleased to see some feedback! submitted by /u/alexlash [link] [comments]  ( 1 min )
    [R] [P] Slideflow: a deep learning framework for digital histology
    Hi all - I'm an applied ML researcher working in an oncology research lab at U Chicago, using digital slides of patient's tumors for tumor classification, prognostication, and treatment response prediction. I'm really excited to share with the community the deep learning tools we've been using, and I'm hoping for any feedback you might have (or direction if you think there's a community or subreddit this might be better suited for). After years of development, we've released our open-source deep learning framework for digital histology, Slideflow (https://github.com/jamesdolezal/slideflow). It has flexible and highly optimized whole-slide image processing, support for a wide variety of existing and custom architectures (with continuous, categorical, or time-series outcomes), real-time digital stain normalization, a number of explainability tools, and integrated uncertainty quantification. It's compatible with both Tensorflow and PyTorch, available on PyPI and DockerHub, and comes with good documentation (https://slideflow.dev/). We've tried out a number of alternative frameworks over the years, and I think the ease of use, flexibility, and performance optimizations set it apart from other repos you'll find on GitHub. We have a handful of local collaborators who are using Slideflow, but I'm hoping to expand our reach and find people in similar fields who are interested in collaborating for ongoing open-source development. I've tried looked for subs relating specifically to computational pathology / digital histology, and haven't found a good community yet - anyone have ideas for how to get connected with like-minded people working in the same field? submitted by /u/shawarma_bees [link] [comments]  ( 2 min )
    [D] Anyone using named tensors or a tensor annotation lib productively?
    It seems like there have been some options out for a while now - e.g. native pytorch named tensors, tsalib, torchtyping - yet I haven't really seen them discussed or used in any code I've come across. Just wondering if anyone has surveyed them recently and is using them. In particular tsalib's warp string syntax for transformations looks really interesting. submitted by /u/patniemeyer [link] [comments]  ( 1 min )
    [D] Are there any analog A.I. computing chips on the retail market yet?
    If so, where to buy them? (for example: I red that mythic has been collecting funding in mid-2021, but I dont know if they are for sale anywhere). submitted by /u/GerritTheBerrit [link] [comments]  ( 2 min )
    [N][R][P] High fidelity 3D face reconstruction from monocular image
    FaceNext is an open source PyTorch library for high fidelity 3D face reconstruction from single/multiple RGB image(s). github.com/abdallahdib/NextFace ​ https://reddit.com/link/u6e7cd/video/ixg0wlzirau81/player submitted by /u/Abd_dib [link] [comments]
    [D] Which keywords describe my task?
    Hey all, I have received a task in an area I am unfamiliar with and need a little help finding suitable papers, so I am looking for keywords. To illustrate the goal, let's say you have 10000 screws (which can be of the same model) and you want to be able to recognize/match each one. You want new screws to be added all the time, so you also want the case that the object could previously be unknown when performing the match. The goal is to develop a capturing system that produces suitable images and to find an architecture/algorithm that is as robust as possible. The object images should be invariant to illumination, rotation and translation during acquisition. It should be a kind of barcode/hash without any additional symbol, based only on the structure of the object. Is there a name for such a task? I think it is not really a classification in the classical sense. I guess it might be just a clever way of finding suitable features for each individual object structure and suitable distance function. Sorry for the long post, I appreciate any help. submitted by /u/Temporary_Lab769 [link] [comments]  ( 1 min )
    [D] Including outer objects in RNN / CNN
    Hello there, Which layer or structure would you append to existing machine learning architectures like yolov5 in order to not only detect the specific object, but also the object which it is part of? Lets say there are xray images of laptops: The laptop itself will be detected and also something like the hard drive or battery inside of it. Is it possible to make the CNN/RNN aware of the fact that the hard drive or battery is inside the Laptop? Hope someone can tell what i mean. Regards David submitted by /u/rohrivibes [link] [comments]  ( 1 min )
    [D] PhD in knowledge representation and reasoning for autonomous agent: research landscape
    I have been offered a PhD in domain of knowledge representation and reasoning for autonomous agents. Goal is to use represent textual rules and world knowledge and then use those represented knowledge for reasoning, so that motion of autonomous agent can be predicted. I have question regarding the current landscape of knowledge representation and reasoning. I see more and more work in data focused model and old Logic and associated paths fading out. Phd project problem itself looks interesting as it focus on work where there will be less need of data and can plan motion in unseen scenarios. But I am concerned about the future career prospective in this domain where this problem is tackled by knowledge representation and reasoning. As I can see there is less and less funding in this domain. What is your take on future landscape of research direction in this domain? submitted by /u/human_treadstone [link] [comments]  ( 2 min )
    [P] My blog on ML model evaluation (Bayes optimal decisions, ROC curve, LLR calibration)
    I have published 3 articles about ML model evaluation on my personal blog. Just finished the 3 installment, so I am keen to share and get some feedback. I cover frameworks traditionally used in ML like ROC curves, but from a Bayes decision perspective, which I have been struggling to find in textbooks/tutorials. The 3rd part is about the evaluation of log-likelihood calibrated models. Hope you will find it interesting/useful! https://mkffl.github.io/2021/10/18/Decisions-Part-1.html https://mkffl.github.io/2021/10/28/Decisions-Part-2.html https://mkffl.github.io/2022/03/02/Decisions-Part-3.html And the underlying code for reproducibility https://github.com/mkffl/decisions submitted by /u/mkffl [link] [comments]  ( 1 min )
    [P] ormb: Docker for Your Models, Help You Manage Models Better
    github.com/kleveross/ormb ormb helps you manage your Machine Learning/Deep Learning models with docker container image registry. It makes your models easy to create, version, share and publish. ``` Save the model in local cache first $ ormb save gaocegege/fashion_model:v1 ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: saved Push the model from local cache to remote registry $ ormb push gaocegege/fashion_model:v1 The push refers to repository [gaocegege/fashion_model] ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: pushed to remote (1 layer, 162.1 KiB total) Pull the model from remote registry to local cache $ ormb pull gaocegege/fashion_model:v1 v1: Pulling from gaocegege/fashion_model ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB Status: Downloaded newer model for gaocegege/fashion_model:v1 Export the model from local cache to current directory $ ormb export gaocegege/fashion_model:v1 ref: localhost/gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB View the local file directory $ tree examples/SavedModel-fashion examples/SavedModel-fashion ├── model │ ├── saved_model.pb │ └── variables │ ├── variables.data-00000-of-00001 │ └── variables.index ├── ormbfile.yaml └── training-serving.ipynb 2 directories, 5 files ``` submitted by /u/gaocegege [link] [comments]  ( 1 min )
    [D] Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection
    Hi, I have just published my latest medium article. Anomalies are widespread when it comes to working on data. They become vital in time series. So, It is crucial to propose efficient methods to detect and deal with them. This article illustrates a state-of-the-art model called DGHL for anomaly detection. DGHL includes a ConvNet as a Generator and instead of encoding it maximizes the likelihood with the Alternating Back-Propagation algorithms. https://rezayazdanfar.medium.com/deep-generative-model-with-hierarchical-latent-factors-for-time-series-anomaly-detection-8d6eaebad8bc submitted by /u/rezayazdanfar [link] [comments]  ( 1 min )
    [P] app to play with latent diffusion models
    just published “geni”, a new minimal app that uses Latent Diffusion Models. It will not produce DALL-E-ish results but it’s fast and great for playing with prompt engineering. Also, it’s free. would love to have the community playing with it. check it out here: https://geni.vercel.app submitted by /u/viccpopa [link] [comments]  ( 1 min )
    [R] VQ-Flows: Vector Quantized Local Normalizing Flows
    arXiV: https://arxiv.org/abs/2203.11556   Summary: We introduce a novel statistical framework for learning a mixture of local normalizing flows as "chart maps" over the data manifold. Our framework augments the expressivity of recent approaches while preserving the signature property of normalizing flows, that they admit exact density evaluation. We learn a suitable atlas of charts for the data manifold via a vector quantized auto-encoder (VQ-AE) and the distributions over them using a conditional flow. We validate experimentally that our probabilistic framework enables existing approaches to better model data distributions over complex manifolds.​   GitHub: Coming Soon Author here, happy to answer any questions. submitted by /u/tshrjn [link] [comments]  ( 1 min )
  • Open

    Does anyone have a guess as to why my network isn’t working? (more info in comments)
    submitted by /u/-i-hate-this-place- [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    Sponsored Post By Luis Bermudez This blog walks through a process for experimenting with hyperparameters, training algorithms and other parameters […] The post Guide to Iteratively Tuning GNNs appeared first on Machine Learning Mastery.  ( 6 min )
    Managing Data for Machine Learning Project
    Big data, labeled data, noisy data. Machine learning projects all need to look at data. Data is a critical aspect […] The post Managing Data for Machine Learning Project appeared first on Machine Learning Mastery.  ( 30 min )
  • Open

    7 Tips for making your code more ‘pythonic’ and elegant
    7 use-cases where you can make your python code more nifty, concise and elegant — without compromising readability. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 4 min )
  • Open

    AI and Healthcare: AI as a Triaging Tool for Healthcare
    Healthcare offers one of the biggest areas where AI could impact people. AI in healthcare is already widespread but is expected to grow even further. The global artificial intelligence in healthcare market size was valued at USD 10.4 billion in 2021. It is expected to expand at a compound annual growth rate (CAGR) of 38.4%… Read More »AI and Healthcare: AI as a Triaging Tool for Healthcare The post AI and Healthcare: AI as a Triaging Tool for Healthcare appeared first on Data Science Central.  ( 3 min )
    Fallacy of Becoming Data-driven – Part 2: Cultural Transformation
    In my first blog of the series “Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed”, I preached about the critical importance of reframing the conversion away from data-driven to becoming value-obsessed. Instead of focusing on becoming value-driven, organizations need to focus on how to uncover the customer, product, service, and operational insights buried in… Read More »Fallacy of Becoming Data-driven – Part 2: Cultural Transformation The post Fallacy of Becoming Data-driven – Part 2: Cultural Transformation appeared first on Data Science Central.  ( 5 min )
    A Glossary of Knowledge Graph Terms
    As with many fields, knowledge graphs boast a wide array of specialized terms. This guide provides a handy reference to these concepts. Resource Description Framework (RDF) The Resource Description Framework (or RDF) is a conceptual framework established in the early 2000s by the World Wide Web Consortium for describing sets of interrelated assertions. RDF breaks… Read More »A Glossary of Knowledge Graph Terms The post A Glossary of Knowledge Graph Terms appeared first on Data Science Central.  ( 10 min )
  • Open

    Auto-Gait: Automatic Ataxia Risk Assessment with Computer Vision on Gait Task Videos. (arXiv:2203.08215v2 [cs.CV] UPDATED)
    In this paper, we investigated whether we can 1) detect participants with ataxia-specific gait characteristics (risk-prediction), and 2) assess severity of ataxia from gait (severity-assessment) using computer vision. We created a dataset of 155 videos from 89 participants, 24 controls and 65 diagnosed with (or are pre-manifest) spinocerebellar ataxias (SCAs), performing the gait task of the Scale for the Assessment and Rating of Ataxia (SARA) from 11 medical sites located in 8 different states across the United States. We develop a computer vision pipeline to detect, track, and separate out the participants from their surroundings and construct several features from their body pose coordinates to capture gait characteristics like step width, step length, swing, stability, speed, etc. Our risk-prediction model achieves 83.06% accuracy and an 80.23% F1 score. Similarly, our severity-assessment model achieves a mean absolute error (MAE) score of 0.6225 and a Pearson's correlation coefficient score of 0.7268. Our models still performed competitively when evaluated on data from sites not used during training. Furthermore, through feature importance analysis, we found that our models associate wider steps, decreased walking speed, and increased instability with greater ataxia severity, which is consistent with previously established clinical knowledge. Our models create possibilities for remote ataxia assessment in non-clinical settings in the future, which could significantly improve accessibility of ataxia care. Furthermore, our underlying dataset was assembled from a geographically diverse cohort, highlighting its potential to further increase equity. The code used in this study is open to the public, and the anonymized body pose landmark dataset is also available upon request.
    Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer. (arXiv:2204.07537v1 [cs.CV])
    Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text.  ( 2 min )
    Effects of Multi-Aspect Online Reviews with Unobserved Confounders: Estimation and Implication. (arXiv:2110.01746v2 [cs.LG] UPDATED)
    Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely consider the effects of single numerical causes and ignore different effects of multiple aspects -- e.g., Food, Service -- embedded in the textual reviews; they assume the absence of hidden confounders in observational studies, e.g., consumers' personal preferences; and they overlook the indirect effects of numerical causes that can potentially cancel out the effect of textual reviews on business revenue. We thereby propose an alternative perspective to this single-cause-based effect estimation of online reviews: in the presence of hidden confounders, we consider multi-aspect textual reviews, particularly, their total effects on business revenue and direct effects with the numerical cause -- ratings -- being the mediator. We draw on recent advances in machine learning and causal inference to together estimate the hidden confounders and causal effects. We present empirical evaluations using real-world examples to discuss the importance and implications of differentiating the multi-aspect effects in strategizing business operations.  ( 2 min )
    Multi-domain Integrative Swin Transformer network for Sparse-View Tomographic Reconstruction. (arXiv:2111.14831v7 [eess.IV] UPDATED)
    Decreasing projection views to lower X-ray radiation dose usually leads to severe streak artifacts. To improve image quality from sparse-view data, a Multi-domain Integrative Swin Transformer network (MIST-net) was developed in this article. First, MIST-net incorporated lavish domain features from data, residual-data, image, and residual-image using flexible network architectures, where residual-data and residual-image sub-network was considered as data consistency module to eliminate interpolation and reconstruction errors. Second, a trainable edge enhancement filter was incorporated to detect and protect image edges. Third, a high-quality reconstruction Swin transformer (i.e., Recformer) was designed to capture image global features. The experiment results on numerical and real cardiac clinical datasets with 48-views demonstrated that our proposed MIST-net provided better image quality with more small features and sharp edges than other competitors.
    GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning. (arXiv:2111.11210v3 [cs.LG] UPDATED)
    Continual learning (CL) aims to develop techniques by which a single model adapts to an increasing number of tasks encountered sequentially, thereby potentially leveraging learnings across tasks in a resource-efficient manner. A major challenge for CL systems is catastrophic forgetting, where earlier tasks are forgotten while learning a new task. To address this, replay-based CL approaches maintain and repeatedly retrain on a small buffer of data selected across encountered tasks. We propose Gradient Coreset Replay (GCR), a novel strategy for replay buffer selection and update using a carefully designed optimization criterion. Specifically, we select and maintain a "coreset" that closely approximates the gradient of all the data seen so far with respect to current model parameters, and discuss key strategies needed for its effective application to the continual learning setting. We show significant gains (2%-4% absolute) over the state-of-the-art in the well-studied offline continual learning setting. Our findings also effectively transfer to online / streaming CL settings, showing upto 5% gains over existing approaches. Finally, we demonstrate the value of supervised contrastive loss for continual learning, which yields a cumulative gain of up to 5% accuracy when combined with our subset selection strategy.
    Learning to Accelerate by the Methods of Step-size Planning. (arXiv:2204.01705v3 [cs.LG] UPDATED)
    Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call {\em step-size planning}, use the {\em update experience} to learn an improved way of updating the parameters. The methods organize the experience into $K$ steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, $1 - \sqrt{\mu/L}$, where $\mu, L$ are the strongly convex factor of the loss function $F$ and the Lipschitz constant of $F'$, which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a $10^{-3}$ accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.  ( 2 min )
    Transfer Learning for Instance Segmentation of Waste Bottles using Mask R-CNN Algorithm. (arXiv:2204.07437v1 [cs.CV])
    This paper proposes a methodological approach with a transfer learning scheme for plastic waste bottle detection and instance segmentation using the \textit{mask region proposal convolutional neural network} (Mask R-CNN). Plastic bottles constitute one of the major pollutants posing a serious threat to the environment both in oceans and on land. The automated identification and segregation of bottles can facilitate plastic waste recycling. We prepare a custom-made dataset of 192 bottle images with pixel-by pixel-polygon annotation for the automatic segmentation task. The proposed transfer learning scheme makes use of a Mask R-CNN model pre-trained on the Microsoft COCO dataset. We present a comprehensive scheme for fine-tuning the base pre-trained Mask-RCNN model on our custom dataset. Our final fine-tuned model has achieved 59.4 \textit{mean average precision} (mAP), which corresponds to the MS COCO metric. The results indicate a promising application of deep learning for detecting waste bottles.  ( 2 min )
    Big-means: Less is More for K-means Clustering. (arXiv:2204.07485v1 [cs.LG])
    K-means clustering plays a vital role in data mining. However, its performance drastically drops when applied to huge amounts of data. We propose a new heuristic that is built on the basis of regular K-means for faster and more accurate big data clustering using the "less is more" and MSSC decomposition approaches. The main advantage of the proposed algorithm is that it naturally turns the K-means local search into global one through the process of decomposition of the MSSC problem. On one hand, decomposition of the MSSC problem into smaller subproblems reduces the computational complexity and allows for their parallel processing. On the other hand, the MSSC decomposition provides a new method for the natural data-driven shaking of the incumbent solution while introducing a new neighborhood structure for the solution of the MSSC problem. This leads to a new heuristic that improves K-means in big data conditions. The scalability of the algorithm to big data can be easily adjusted by choosing the appropriate number of subproblems and their size. The proposed algorithm is both scalable and accurate. In our experiments it outperforms all recent state-of-the-art algorithms for the MSSC in terms of time as well as the solution quality.  ( 2 min )
    Towards PAC Multi-Object Detection and Tracking. (arXiv:2204.07482v1 [cs.CV])
    Accurately detecting and tracking multi-objects is important for safety-critical applications such as autonomous navigation. However, it remains challenging to provide guarantees on the performance of state-of-the-art techniques based on deep learning. We consider a strategy known as conformal prediction, which predicts sets of labels instead of a single label; in the classification and regression settings, these algorithms can guarantee that the true label lies within the prediction set with high probability. Building on these ideas, we propose multi-object detection and tracking algorithms that come with probably approximately correct (PAC) guarantees. They do so by constructing both a prediction set around each object detection as well as around the set of edge transitions; given an object, the detection prediction set contains its true bounding box with high probability, and the edge prediction set contains its true transition across frames with high probability. We empirically demonstrate that our method can detect and track objects with PAC guarantees on the COCO and MOT-17 datasets.
    A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow. (arXiv:2110.11991v2 [eess.SY] UPDATED)
    With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
    Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs. (arXiv:2111.06483v3 [cs.LG] UPDATED)
    We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory savings as the number of workers increases. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models. We show that, coupled with SAR, our optimized attention kernels lead to significant speedups and memory savings in attention-based GNNs.We made the SAR GNN training library publicy available: \url{https://github.com/IntelLabs/SAR}.
    Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records. (arXiv:2203.06918v2 [cs.CL] UPDATED)
    Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.
    The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants. (arXiv:2204.07431v1 [cs.NE])
    Selecting the most suitable algorithm and determining its hyperparameters for a given optimization problem is a challenging task. Accurately predicting how well a certain algorithm could solve the problem is hence desirable. Recent studies in single-objective numerical optimization show that supervised machine learning methods can predict algorithm performance using landscape features extracted from the problem instances. Existing approaches typically treat the algorithms as black-boxes, without consideration of their characteristics. To investigate in this work if a selection of landscape features that depends on algorithms properties could further improve regression accuracy, we regard the modular CMA-ES framework and estimate how much each landscape feature contributes to the best algorithm performance regression models. Exploratory data analysis performed on this data indicate that the set of most relevant features does not depend on the configuration of individual modules, but the influence that these features have on regression accuracy does. In addition, we have shown that by using classifiers that take the features relevance on the model accuracy, we are able to predict the status of individual modules in the CMA-ES configurations.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    On the Importance of Firth Bias Reduction in Few-Shot Classification. (arXiv:2110.02529v2 [cs.CV] UPDATED)
    Learning accurate classifiers for novel categories from very few examples, known as few-shot image classification, is a challenging task in statistical machine learning and computer vision. The performance in few-shot classification suffers from the bias in the estimation of classifier parameters; however, an effective underlying bias reduction technique that could alleviate this issue in training few-shot classifiers has been overlooked. In this work, we demonstrate the effectiveness of Firth bias reduction in few-shot classification. Theoretically, Firth bias reduction removes the $O(N^{-1})$ first order term from the small-sample bias of the Maximum Likelihood Estimator. Here we show that the general Firth bias reduction technique simplifies to encouraging uniform class assignment probabilities for multinomial logistic classification, and almost has the same effect in cosine classifiers. We derive an easy-to-implement optimization objective for Firth penalized multinomial logistic and cosine classifiers, which is equivalent to penalizing the cross-entropy loss with a KL-divergence between the uniform label distribution and the predictions. Then, we empirically evaluate that it is consistently effective across the board for few-shot image classification, regardless of (1) the feature representations from different backbones, (2) the number of samples per class, and (3) the number of classes. Finally, we show the robustness of Firth bias reduction, in the case of imbalanced data distribution. Our implementation is available at https://github.com/ehsansaleh/firth_bias_reduction
    Efficient Architecture Search for Diverse Tasks. (arXiv:2204.07554v1 [cs.LG])
    While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on NAS-Bench-360, a suite of ten tasks designed for benchmarking NAS in diverse domains. DASH outperforms state-of-the-art methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.
    CryoRL: Reinforcement Learning Enables Efficient Cryo-EM Data Collection. (arXiv:2204.07543v1 [cs.LG])
    Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream structural biology techniques because of its ability to determine high-resolution structures of dynamic bio-molecules. However, cryo-EM data acquisition remains expensive and labor-intensive, requiring substantial expertise. Structural biologists need a more efficient and objective method to collect the best data in a limited time frame. We formulate the cryo-EM data collection task as an optimization problem in this work. The goal is to maximize the total number of good images taken within a specified period. We show that reinforcement learning offers an effective way to plan cryo-EM data collection, successfully navigating heterogenous cryo-EM grids. The approach we developed, cryoRL, demonstrates better performance than average users for data collection under similar settings.
    Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters. (arXiv:2204.07447v1 [cs.CL])
    Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. While early work identified certain biases in NLI models, recent advancements in modeling and datasets demonstrated promising performance. In this work, we further explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. First, we analyze the robustness of these models to longer and out-of-domain inputs. Then, we develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset. Interestingly, we find NLI scores to provide strong retrieval signals, leading to more relevant evidence extractions compared to common similarity-based methods. Finally, we go further and investigate whole document clusters to identify both discrepancies and consensus among sources. In a test case, we find real inconsistencies between Wikipedia pages in different languages about the same topic.  ( 2 min )
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Deep learning model solves change point detection for multiple change types. (arXiv:2204.07403v1 [cs.LG])
    A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them.  ( 2 min )
    Characterizing metastable states with the help of machine learning. (arXiv:2204.07391v1 [physics.comp-ph])
    Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature is becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, Chignolin and Bovine Pancreatic Trypsin Inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration. (arXiv:2202.03259v2 [cs.NE] UPDATED)
    It has long been observed that the performance of evolutionary algorithms and other randomized search heuristics can benefit from a non-static choice of the parameters that steer their optimization behavior. Mechanisms that identify suitable configurations on the fly ("parameter control") or via a dedicated training process ("dynamic algorithm configuration") are therefore an important component of modern evolutionary computation frameworks. Several approaches to address the dynamic parameter setting problem exist, but we barely understand which ones to prefer for which applications. As in classical benchmarking, problem collections with a known ground truth can offer very meaningful insights in this context. Unfortunately, settings with well-understood control policies are very rare. One of the few exceptions for which we know which parameter settings minimize the expected runtime is the LeadingOnes problem. We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values. This also allows us to compute optimal parameter portfolios of a given size. We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration.
    Simple but Effective: CLIP Embeddings for Embodied AI. (arXiv:2111.09888v2 [cs.CV] UPDATED)
    Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation. We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks. We build incredibly simple baselines, named EmbCLIP, with no task specific architectures, inductive biases (such as the use of semantic maps), auxiliary tasks during training, or depth maps -- yet we find that our improved baselines perform very well across a range of tasks and simulators. EmbCLIP tops the RoboTHOR ObjectNav leaderboard by a huge margin of 20 pts (Success Rate). It tops the iTHOR 1-Phase Rearrangement leaderboard, beating the next best submission, which employs Active Neural Mapping, and more than doubling the % Fixed Strict metric (0.08 to 0.17). It also beats the winners of the 2021 Habitat ObjectNav Challenge, which employ auxiliary tasks, depth maps, and human demonstrations, and those of the 2019 Habitat PointNav Challenge. We evaluate the ability of CLIP's visual representations at capturing semantic information about input observations -- primitives that are useful for navigation-heavy embodied tasks -- and find that CLIP's representations encode these primitives more effectively than ImageNet-pretrained backbones. Finally, we extend one of our baselines, producing an agent capable of zero-shot object navigation that can navigate to objects that were not used as targets during training. Our code and models are available at https://github.com/allenai/embodied-clip  ( 2 min )
    Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing. (arXiv:2111.11236v3 [cs.RO] UPDATED)
    Although nanorobots have been used as clinical prescriptions for work such as gastroscopy, and even photoacoustic tomography technology has been proposed to control nanorobots to deliver drugs at designated delivery points in real time, and there are cases of eliminating "superbacteria" in blood through nanorobots, most technologies are immature, either with low efficiency or low accuracy, Either it can not be mass produced, so the most effective way to treat cancer diseases at this stage is through chemotherapy and radiotherapy. Patients are suffering and can not be cured. Therefore, this paper proposes an ideal model of a treatment method that can completely cure cancer, a cooperative treatment method based on nano robot queue through team member communication and computer vision image classification (target detection).
    Grassmannian Optimization for Online Tensor Completion and Tracking with the t-SVD. (arXiv:2001.11419v4 [eess.SP] UPDATED)
    We propose a new fast streaming algorithm for the tensor completion problem of imputing missing entries of a low-tubal-rank tensor using the tensor singular value decomposition (t-SVD) algebraic framework. We show the t-SVD is a specialization of the well-studied block-term decomposition for third-order tensors, and we present an algorithm under this model that can track changing free submodules from incomplete streaming 2-D data. The proposed algorithm uses principles from incremental gradient descent on the Grassmann manifold of subspaces to solve the tensor completion problem with linear complexity and constant memory in the number of time samples. We provide a local expected linear convergence result for our algorithm. Our empirical results are competitive in accuracy but much faster in compute time than state-of-the-art tensor completion algorithms on real applications to recover temporal chemo-sensing and MRI data under limited sampling.
    An interpretable machine learning approach for ferroalloys consumptions. (arXiv:2204.07421v1 [cs.LG])
    This paper is devoted to a practical method for ferroalloys consumption modeling and optimization. We consider the problem of selecting the optimal process control parameters based on the analysis of historical data from sensors. We developed approach, which predicts results of chemical reactions and give ferroalloys consumption recommendation. The main features of our method are easy interpretation and noise resistance. Our approach is based on k-means clustering algorithm, decision trees and linear regression. The main idea of the method is to identify situations where processes go similarly. For this, we propose using a k-means based dataset clustering algorithm and a classification algorithm to determine the cluster. This algorithm can be also applied to various technological processes, in this article, we demonstrate its application in metallurgy. To test the application of the proposed method, we used it to optimize ferroalloys consumption in Basic Oxygen Furnace steelmaking when finishing steel in a ladle furnace. The minimum required element content for a given steel grade was selected as the predictive model's target variable, and the required amount of the element to be added to the melt as the optimized variable. Keywords: Clustering, Machine Learning, Linear Regression, Steelmaking, Optimization, Gradient Boosting, Artificial Intelligence, Decision Trees, Recommendation services
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
    SuperCone: Unified User Segmentation over Heterogeneous Experts via Concept Meta-learning. (arXiv:2203.07029v2 [cs.LG] UPDATED)
    We study the problem of user segmentation: given a set of users and one or more predefined groups or segments, assign users to their corresponding segments. As an example, for a segment indicating particular interest in a certain area of sports or entertainment, the task will be to predict whether each single user will belong to the segment. However, there may exist numerous long tail prediction tasks that suffer from data availability and may be of heterogeneous nature, which make it hard to capture using single off the shelf model architectures. In this work, we present SuperCone, our unified predicative segments system that addresses the above challenges. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints, and uniformly models each of the prediction task using an approach called "super learning ", that is, combining prediction models with diverse architectures or learning method that are not compatible with each other. Following this, we provide an end to end approach that learns to flexibly attend to best suited heterogeneous experts adaptively, while at the same time incorporating deep representations of the input concepts that augments the above experts. Experiments show that SuperCone significantly outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks and public structured data learning benchmarks.
    Safe Reinforcement Learning Using Black-Box Reachability Analysis. (arXiv:2204.07417v1 [cs.RO])
    Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, and a trajectory-tracking point mass with an unsafe set adjacent to the area of highest reward.
    Model-Based Deep Learning of Joint Probabilistic and Geometric Shaping for Optical Communication. (arXiv:2204.07457v1 [eess.SP])
    Autoencoder-based deep learning is applied to jointly optimize geometric and probabilistic constellation shaping for optical coherent communication. The optimized constellation shaping outperforms the 256 QAM Maxwell-Boltzmann probabilistic distribution with extra 0.05 bits/4D-symbol mutual information for 64 GBd transmission over 170 km SMF link.
    A Machine Learning Tutorial for Operational Meteorology, Part I: Traditional Machine Learning. (arXiv:2204.07492v1 [physics.ao-ph])
    Recently, the use of machine learning in meteorology has increased greatly. While many machine learning methods are not new, university classes on machine learning are largely unavailable to meteorology students and are not required to become a meteorologist. The lack of formal instruction has contributed to perception that machine learning methods are 'black boxes' and thus end-users are hesitant to apply the machine learning methods in their every day workflow. To reduce the opaqueness of machine learning methods and lower hesitancy towards machine learning in meteorology, this paper provides a survey of some of the most common machine learning methods. A familiar meteorological example is used to contextualize the machine learning methods while also discussing machine learning topics using plain language. The following machine learning methods are demonstrated: linear regression; logistic regression; decision trees; random forest; gradient boosted decision trees; naive Bayes; and support vector machines. Beyond discussing the different methods, the paper also contains discussions on the general machine learning process as well as best practices to enable readers to apply machine learning to their own datasets. Furthermore, all code (in the form of Jupyter notebooks and Google Colaboratory notebooks) used to make the examples in the paper is provided in an effort to catalyse the use of machine learning in meteorology.
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.
    Super Resolution for Turbulent Flows in 2D: Stabilized Physics Informed Neural Networks. (arXiv:2204.07413v1 [math.NA])
    We propose a new design of a neural network for solving a zero shot super resolution problem for turbulent flows. We embed Luenberger-type observer into the network's architecture to inform the network of the physics of the process, and to provide error correction and stabilization mechanisms. In addition, to compensate for decrease of observer's performance due to the presence of unknown destabilizing forcing, the network is designed to estimate the contribution of the unknown forcing implicitly from the data over the course of training. By running a set of numerical experiments, we demonstrate that the proposed network does recover unknown forcing from data and is capable of predicting turbulent flows in high resolution from low resolution noisy observations.  ( 2 min )
    Invariance Through Inference. (arXiv:2112.08526v2 [cs.LG] UPDATED)
    We introduce a general approach, called Invariance through Inference, for improving the test-time performance of an agent in deployment environments with unknown perceptual variations. Instead of producing invariant visual features through interpolation, invariance through inference turns adaptation at deployment-time into an unsupervised learning problem. This is achieved in practice by deploying a straightforward algorithm that tries to match the distribution of latent features to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of adaptation scenarios without access to deployment-time rewards, including changes in scene content, camera poses, and lighting conditions. We present results on challenging domains including distractor control suite and sim-to-real transfer for image-based robot manipulation.
    Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information. (arXiv:2204.05255v2 [cs.CR] UPDATED)
    Backdoor attacks insert malicious data into a training set so that, during inference time, it misclassifies inputs that have been patched with a backdoor trigger as the malware specified label. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as "clean-label attacks." Existing clean-label backdoor attacks require knowledge of the entire training set to be effective. Obtaining such knowledge is difficult or impossible because training data are often gathered from multiple sources (e.g., face images from different users). It remains a question whether backdoor attacks still present a real threat. This paper provides an affirmative answer to this question by designing an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class. With poisoning equal to or less than 0.5% of the target-class data and 0.05% of the training set, we can train a model to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger. Our attack works well across datasets and models, even when the trigger presents in the physical world. We explore the space of defenses and find that, surprisingly, our attack can evade the latest state-of-the-art defenses in their vanilla form, or after a simple twist, we can adapt to the downstream defenses. We study the cause of the intriguing effectiveness and find that because the trigger synthesized by our attack contains features as persistent as the original semantic features of the target class, any attempt to remove such triggers would inevitably hurt the model accuracy first.
    Experimentally realized memristive memory augmented neural network. (arXiv:2204.07429v1 [cs.ET])
    Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have difficulties in scaling up because different modules with various structures are difficult to integrate on the same chip and the small sense margin of the content addressable memory for the memory module heavily limited the degree of mismatch calculation. In this work, we implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform and achieve an accuracy that closely matches standard software on digital hardware for the Omniglot dataset. The successful demonstration is supported by implementing new functions in crossbars in addition to widely reported matrix multiplications. For example, the locality-sensitive hashing operation is implemented in crossbar arrays by exploiting the intrinsic stochasticity of memristor devices. Besides, the content-addressable memory module is realized in crossbars, which also supports the degree of mismatches. Simulations based on experimentally validated models show such an implementation can be efficiently scaled up for one-shot learning on the Mini-ImageNet dataset. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms not possible in conventional hardware.
    Rethinking Machine Learning Model Evaluation in Pathology. (arXiv:2204.05205v2 [eess.IV] UPDATED)
    Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for natural images are ill-equipped to deal with pathology images that are significantly large and noisy, require expensive labeling, are hard to interpret, and are susceptible to spurious correlations. We propose a set of practical guidelines for ML evaluation in pathology that address the above concerns. The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests to address issues related to domain shift, robustness, and confounding variables. We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology and improving patient outcomes.
    Synthesizing Informative Training Samples with GAN. (arXiv:2204.07513v1 [cs.LG])
    Remarkable progress has been achieved in synthesizing photo-realistic images with generative adversarial neural networks (GANs). Recently, GANs are utilized as the training sample generator when obtaining or storing real training data is expensive even infeasible. However, traditional GANs generated images are not as informative as the real training samples when being used to train deep neural networks. In this paper, we propose a novel method to synthesize Informative Training samples with GAN (IT-GAN). Specifically, we freeze a pre-trained GAN model and learn the informative latent vectors that corresponds to informative training samples. The synthesized images are required to preserve information for training deep neural networks rather than visual reality or fidelity. Experiments verify that the deep neural networks can learn faster and achieve better performance when being trained with our IT-GAN generated images. We also show that our method is a promising solution to dataset condensation problem.
    Streaming Align-Refine for Non-autoregressive Deliberation. (arXiv:2204.07556v1 [cs.CL])
    We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model. Our algorithm facilitates a simple greedy decoding procedure, and at the same time is capable of producing the decoding result at each frame with limited right context, thus enjoying both high efficiency and low latency. These advantages are achieved by converting the offline Align-Refine algorithm to be streaming-compatible, with a novel transformer decoder architecture that performs local self-attentions for both text and audio, and a time-aligned cross-attention at each layer. Furthermore, we perform discriminative training of our model with the minimum word error rate (MWER) criterion, which has not been done in the non-AR decoding literature. Experiments on voice search datasets and Librispeech show that with reasonable right context, our streaming model performs as well as the offline counterpart, and discriminative training leads to further WER gain when the first-pass model has small capacity.
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold. (arXiv:2204.07439v1 [cs.CV])
    Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks. BNNs, on the other hand, suffer from information loss because binary activations are limited to only two values, resulting in reduced accuracy. To improve the accuracy, previous studies have attempted to control the distribution of binary activation by manually shifting the threshold of the activation function or making the shift amount trainable. During the process, they usually depended on statistical information computed from a batch. We argue that using statistical data from a batch fails to capture the crucial information for each input instance in BNN computations, and the differences between statistical information computed from each instance need to be considered when determining the binary activation threshold of each instance. Based on the concept, we propose the Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which decides the activation threshold value considering the difference between statistical data computed from a batch and each instance. The proposed INSTA-BNN outperforms the baseline by 2.5% and 2.3% on the ImageNet classification task with comparable computing cost, achieving 68.0% and 71.7% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively.
    Neural Structured Prediction for Inductive Node Classification. (arXiv:2204.07524v1 [cs.LG])
    This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of node labels, e.g., conditional random fields (CRFs). In this paper, we present a new approach called the Structured Proxy Network (SPN), which combines the advantages of both worlds. SPN defines flexible potential functions of CRFs with GNNs. However, learning such a model is nontrivial as it involves optimizing a maximin game with high-cost inference. Inspired by the underlying connection between joint and marginal distributions defined by Markov networks, we propose to solve an approximate version of the optimization problem as a proxy, which yields a near-optimal solution, making learning more efficient. Extensive experiments on two settings show that our approach outperforms many competitive baselines.
    Selecting Continuous Life-Like Cellular Automata for Halting Unpredictability: Evolving for Abiogenesis. (arXiv:2204.07541v1 [cs.NE])
    Substantial efforts have been applied to engineer CA with desired emergent properties, such as supporting gliders. Recent work in continuous CA has generated a wide variety of compelling bioreminescent patterns, and the expansion of CA research into continuous numbers, multiple channels, and higher dimensions complicates their study. In this work we devise a strategy for evolving CA and CA patterns in two steps, based on the simple idea that CA are likely to be complex and computationally capable if they support patterns that grow indefinitely as well as patterns that vanish completely, and are difficult to predict the difference in advance. The second part of our strategy evolves patterns by selecting for mobility and conservation of mean cell value. We validate our pattern evolution method by re-discovering gliders in 17 of 17 Lenia CA, and also report 5 new evolved CA that support evolved glider patterns, differing from previously reported Lenia patterns. The CA reported here share neighborhood kernels with previously described Lenia CA, but exhibit a wider range of typical dynamics than their Lenia counterparts. Code for evolving continuous CA is made available under an MIT License.
    Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection. (arXiv:2204.07569v1 [cs.IT])
    Faster-than-Nyquist (FTN) signaling is a candidate non-orthonormal transmission technique to improve the spectral efficiency (SE) of future communication systems. However, such improvements of the SE are at the cost of additional computational complexity to remove the intentionally introduced intersymbol interference. In this paper, we investigate the use of deep learning (DL) to reduce the detection complexity of FTN signaling. To eliminate the need of having a noise whitening filter at the receiver, we first present an equivalent FTN signaling model based on using a set of orthonormal basis functions and identify its operation region. Second, we propose a DL-based list sphere decoding (DL-LSD) algorithm that selects and updates the initial radius of the original LSD to guarantee a pre-defined number $N_{\text{L}}$ of lattice points inside the hypersphere. This is achieved by training a neural network to output an approximate initial radius that includes $N_{\text{L}}$ lattice points. At the testing phase, if the hypersphere has more than $N_{\text{L}}$ lattice points, we keep the $N_{\text{L}}$ closest points to the point corresponding to the received FTN signal; however, if the hypersphere has less than $N_{\text{L}}$ points, we increase the approximate initial radius by a value that depends on the standard deviation of the distribution of the output radii from the training phase. Then, the approximate value of the log-likelihood ratio (LLR) is calculated based on the obtained $N_{\text{L}}$ points. Simulation results show that the computational complexity of the proposed DL-LSD is lower than its counterpart of the original LSD by orders of magnitude.
    Accurate ADMET Prediction with XGBoost. (arXiv:2204.07532v1 [q-bio.BM])
    The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. Here, we apply an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 10 tasks and top 3 in 18 tasks.  ( 2 min )
    Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. (arXiv:2111.00009v2 [eess.AS] UPDATED)
    In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separately. In this work, we argue that such a scheme is sub-optimal and propose a principled solution that decodes all speakers jointly. We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers. We employ a joint decoder that can make use of this uncertainty together with higher-level language information. For this, we revisit decoding algorithms used in factorial generative models in early multi-talker speech recognition systems. In contrast with these early works, we replace the GMM acoustic model with DNN, which provides greater modeling power and simplifies part of the inference. We demonstrate the advantage of joint decoding in proof of concept experiments on a mixed-TIDIGITS dataset.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.  ( 2 min )
    Barwise Compression Schemes for Audio-Based Music Structure Analysis. (arXiv:2202.04981v2 [cs.SD] UPDATED)
    Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.  ( 2 min )
    GitTables: A Large-Scale Corpus of Relational Tables. (arXiv:2106.07258v4 [cs.DB] UPDATED)
    The success of deep learning has sparked interest in improving relational table tasks, like data preparation and search, with table representation models trained on large table corpora. Existing table corpora primarily contain tables extracted from HTML pages, limiting the capability to represent offline database tables. To train and evaluate high-capacity models for applications beyond the Web, we need resources with tables that resemble relational database tables. Here we introduce GitTables, a corpus of 1M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 10M tables. Analyses of GitTables show that its structure, content, and topical coverage differ significantly from existing table corpora. We annotate table columns in GitTables with semantic types, hierarchical relations and descriptions from Schema.org and DBpedia. The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. We make the corpus and code available at https://gittables.github.io.  ( 2 min )
    NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming. (arXiv:2109.12171v3 [cs.LG] UPDATED)
    Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.  ( 2 min )
    Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning. (arXiv:2202.03666v2 [cs.LG] UPDATED)
    Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.
    Sparsifying the Update Step in Graph Neural Networks. (arXiv:2109.00909v3 [cs.LG] UPDATED)
    Message-Passing Neural Networks (MPNNs), the most prominent Graph Neural Network (GNN) framework, celebrate much success in the analysis of graph-structured data. Concurrently, the sparsification of Neural Network models attracts a great amount of academic and industrial interest. In this paper we conduct a structured, empirical study of the effect of sparsification on the trainable part of MPNNs known as the Update step. To this end, we design a series of models to successively sparsify the linear transform in the Update step. Specifically, we propose the ExpanderGNN model with a tuneable sparsification rate and the Activation-Only GNN, which has no linear transform in the Update step. In agreement with a growing trend in the literature the sparsification paradigm is changed by initialising sparse neural network architectures rather than expensively sparsifying already trained architectures. Our novel benchmark models enable a better understanding of the influence of the Update step on model performance and outperform existing simplified benchmark models such as the Simple Graph Convolution. The ExpanderGNNs, and in some cases the Activation-Only models, achieve performance on par with their vanilla counterparts on several downstream tasks, while containing significantly fewer trainable parameters. Our code is publicly available at: https://github.com/ChangminWu/ExpanderGNN.
    Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning -- Extended Version. (arXiv:2203.16110v3 [cs.LG] UPDATED)
    In step with the digitalization of transportation, we are witnessing a growing range of path-based smart-city applications, e.g., travel-time estimation and travel path ranking. A temporal path(TP) that includes temporal information, e.g., departure time, into the path is fundamental to enable such applications. In this setting, it is essential to learn generic temporal path representations(TPRs) that consider spatial and temporal correlations simultaneously and that can be used in different applications, i.e., downstream tasks. Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks; (ii) through unsupervised methods can learn generic representations, they disregard the temporal aspect, leading to sub-optimal results. To contend with the limitations of existing solutions, we propose a Weakly-Supervised Contrastive (WSC) learning model. We first propose a temporal path encoder that encodes both the spatial and temporal information of a temporal path into a TPR. To train the encoder, we introduce weak labels that are easy and inexpensive to obtain and are relevant to different tasks, e.g., temporal labels indicating peak vs. off-peak hours from departure times. Based on the weak labels, we construct meaningful positive and negative temporal path samples by considering both spatial and temporal information, which facilities training the encoder using contrastive learning by pulling closer to the positive samples' representations while pushing away the negative samples' representations. To better guide contrastive learning, we propose a learning strategy based on Curriculum Learning such that the learning performs from easy to hard training instances. Experiments studies verify the effectiveness of the proposed method.
    Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning. (arXiv:2202.10629v2 [cs.LG] UPDATED)
    In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces the following challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper introduces a new technique called model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation on the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming.  ( 2 min )
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    The Distributed Information Bottleneck reveals the explanatory structure of complex systems. (arXiv:2204.07576v1 [cs.LG])
    The fruits of science are relationships made comprehensible, often by way of approximation. While deep learning is an extremely powerful way to find relationships in data, its use in science has been hindered by the difficulty of understanding the learned relationships. The Information Bottleneck (IB) is an information theoretic framework for understanding a relationship between an input and an output in terms of a trade-off between the fidelity and complexity of approximations to the relationship. Here we show that a crucial modification -- distributing bottlenecks across multiple components of the input -- opens fundamentally new avenues for interpretable deep learning in science. The Distributed Information Bottleneck throttles the downstream complexity of interactions between the components of the input, deconstructing a relationship into meaningful approximations found through deep learning without requiring custom-made datasets or neural network architectures. Applied to a complex system, the approximations illuminate aspects of the system's nature by restricting -- and monitoring -- the information about different components incorporated into the approximation. We demonstrate the Distributed IB's explanatory utility in systems drawn from applied mathematics and condensed matter physics. In the former, we deconstruct a Boolean circuit into approximations that isolate the most informative subsets of input components without requiring exhaustive search. In the latter, we localize information about future plastic rearrangement in the static structure of a sheared glass, and find the information to be more or less diffuse depending on the system's preparation. By way of a principled scheme of approximations, the Distributed IB brings much-needed interpretability to deep learning and enables unprecedented analysis of information flow through a system.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Transferability Properties of Graph Neural Networks. (arXiv:2112.04629v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are composed of layers consisting of graph convolutions and pointwise nonlinearities. Due to their invariance and stability properties, GNNs are provably successful at learning representations from data supported on moderate-scale graphs. However, they are difficult to learn on large-scale graphs. In this paper, we study the problem of training GNNs on graphs of moderate size and transferring them to large-scale graphs. We use graph limits called graphons to define limit objects for graph filters and GNNs -- graphon filters and graphon neural networks (WNNs) -- which we interpret as generative models for graph filters and GNNs. We then show that graphon filters and WNNs can be approximated by graph filters and GNNs sampled from them on weighted and stochastic graphs. Because the error of these approximations can be upper bounded, by a triangle inequality argument we can further bound the error of transferring a graph filter or a GNN across graphs. Our results show that (i) the transference error decreases with the graph size, and (ii) that graph filters have a transferability-discriminability tradeoff that in GNNs is alleviated by the scattering behavior of the nonlinearity. These findings are demonstrated empirically in a movie recommendation problem and in a decentralized control task.
    Kernel similarity matching with Hebbian neural networks. (arXiv:2204.07475v1 [cs.NE])
    Recent works have derived neural networks with online correlation-based learning rules to perform \textit{kernel similarity matching}. These works applied existing linear similarity matching algorithms to nonlinear features generated with random Fourier methods. In this paper attempt to perform kernel similarity matching by directly learning the nonlinear features. Our algorithm proceeds by deriving and then minimizing an upper bound for the sum of squared errors between output and input kernel similarities. The construction of our upper bound leads to online correlation-based learning rules which can be implemented with a 1 layer recurrent neural network. In addition to generating high-dimensional linearly separable representations, we show that our upper bound naturally yields representations which are sparse and selective for specific input patterns. We compare the approximation quality of our method to neural random Fourier method and variants of the popular but non-biological "Nystr{\"o}m" method for approximating the kernel matrix. Our method appears to be comparable or better than randomly sampled Nystr{\"o}m methods when the outputs are relatively low dimensional (although still potentially higher dimensional than the inputs) but less faithful when the outputs are very high dimensional.
    Prototype-based Domain Generalization Framework for Subject-Independent Brain-Computer Interfaces. (arXiv:2204.07358v1 [eess.SP])
    Brain-computer interface (BCI) is challenging to use in practice due to the inter/intra-subject variability of electroencephalography (EEG). The BCI system, in general, necessitates a calibration technique to obtain subject/session-specific data in order to tune the model each time the system is utilized. This issue is acknowledged as a key hindrance to BCI, and a new strategy based on domain generalization has recently evolved to address it. In light of this, we've concentrated on developing an EEG classification framework that can be applied directly to data from unknown domains (i.e. subjects), using only data acquired from separate subjects previously. For this purpose, in this paper, we proposed a framework that employs the open-set recognition technique as an auxiliary task to learn subject-specific style features from the source dataset while helping the shared feature extractor with mapping the features of the unseen target dataset as a new unseen domain. Our aim is to impose cross-instance style in-variance in the same domain and reduce the open space risk on the potential unseen subject in order to improve the generalization ability of the shared feature extractor. Our experiments showed that using the domain information as an auxiliary network increases the generalization performance.  ( 2 min )
    End-to-End Sensitivity-Based Filter Pruning. (arXiv:2204.07412v1 [cs.CV])
    In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interdependencies, which is essential to find a performant sparse sub-network. Our proposed method can train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pretrained network. Ultimately, we do not need layer-specific hyperparameters and pre-defined layer budgets, since SbF-Pruner can implicitly determine the appropriate number of channels in each layer. Our experimental results on different network architectures suggest that SbF-Pruner outperforms advanced pruning methods. Notably, on CIFAR-10, without requiring a pretrained baseline network, we obtain 1.02% and 1.19% accuracy gain on ResNet56 and ResNet110, compared to the baseline reported for state-of-the-art pruning algorithms. This is while SbF-Pruner reduces parameter-count by 52.3% (for ResNet56) and 54% (for ResNet101), which is better than the state-of-the-art pruning algorithms with a high margin of 9.5% and 6.6%.  ( 2 min )
    Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method. (arXiv:2204.07390v1 [cs.CL])
    Email is one of the most widely used ways to communicate, with millions of people and businesses relying on it to communicate and share knowledge and information on a daily basis. Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years. Processing and managing emails properly for individuals and companies are getting increasingly difficult. This article proposes a novel technique for email spam detection that is based on a combination of convolutional neural networks, gated recurrent units, and attention mechanisms. During system training, the network is selectively focused on necessary parts of the email text. The usage of convolution layers to extract more meaningful, abstract, and generalizable features by hierarchical representation is the major contribution of this study. Additionally, this contribution incorporates cross-dataset evaluation, which enables the generation of more independent performance results from the model's training dataset. According to cross-dataset evaluation results, the proposed technique advances the results of the present attention-based techniques by utilizing temporal convolutions, which give us more flexible receptive field sizes are utilized. The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them.  ( 2 min )
    Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection. (arXiv:2204.07372v1 [cs.CL])
    Current works in the generation of personalized dialogue primarily contribute to the agent avoiding contradictory persona and driving the response more informative. However, we found that the generated responses from these models are mostly self-centered with little care for the other party since they ignore the user's persona. Moreover, we consider high-quality transmission is essentially built based on apprehending the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting implicit user persona. Because it's difficult to collect a large number of personas for each user, we attempt to model the user's potential persona and its representation from the dialogue absence of any external information. Perception variable and fader variable are conceived utilizing Conditional Variational Inference. The two latent variables simulate the process of people being aware of the other party's persona and producing the corresponding expression in conversation. Finally, Posterior-discriminated Regularization is presented to enhance the training procedure. Empirical studies demonstrate that compared with the state-of-the-art methods, ours is more concerned with the user's persona and outperforms in evaluations.  ( 2 min )
    Anomalous Sound Detection Based on Machine Activity Detection. (arXiv:2204.07353v1 [eess.AS])
    We have developed an unsupervised anomalous sound detection method for machine condition monitoring that utilizes an auxiliary task -- detecting when the target machine is active. First, we train a model that detects machine activity by using normal data with machine activity labels and then use the activity-detection error as the anomaly score for a given sound clip if we have access to the ground-truth activity labels in the inference phase. If these labels are not available, the anomaly score is calculated through outlier detection on the embedding vectors obtained by the activity-detection model. Solving this auxiliary task enables the model to learn the difference between the target machine sounds and similar background noise, which makes it possible to identify small deviations in the target sounds. Experimental results showed that the proposed method improves the anomaly-detection performance of the conventional method complementarily by means of an ensemble.  ( 2 min )
    SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example Focusing. (arXiv:2204.07406v1 [cs.CV])
    Crowd counting based on density maps is generally regarded as a regression task.Deep learning is used to learn the mapping between image content and crowd density distribution. Although great success has been achieved, some pedestrians far away from the camera are difficult to be detected. And the number of hard examples is often larger. Existing methods with simple Euclidean distance algorithm indiscriminately optimize the hard and easy examples so that the densities of hard examples are usually incorrectly predicted to be lower or even zero, which results in large counting errors. To address this problem, we are the first to propose the Hard Example Focusing(HEF) algorithm for the regression task of crowd counting. The HEF algorithm makes our model rapidly focus on hard examples by attenuating the contribution of easy examples.Then higher importance will be given to the hard examples with wrong estimations. Moreover, the scale variations in crowd scenes are large, and the scale annotations are labor-intensive and expensive. By proposing a multi-Scale Semantic Refining (SSR) strategy, lower layers of our model can break through the limitation of deep learning to capture semantic features of different scales to sufficiently deal with the scale variation. We perform extensive experiments on six benchmark datasets to verify the proposed method. Results indicate the superiority of our proposed method over the state-of-the-art methods. Moreover, our designed model is smaller and faster.  ( 2 min )
    Crowd counting with segmentation attention convolutional neural network. (arXiv:2204.07380v1 [cs.CV])
    Deep learning occupies an undisputed dominance in crowd counting. In this paper, we propose a novel convolutional neural network (CNN) architecture called SegCrowdNet. Despite the complex background in crowd scenes, the proposeSegCrowdNet still adaptively highlights the human head region and suppresses the non-head region by segmentation. With the guidance of an attention mechanism, the proposed SegCrowdNet pays more attention to the human head region and automatically encodes the highly refined density map. The crowd count can be obtained by integrating the density map. To adapt the variation of crowd counts, SegCrowdNet intelligently classifies the crowd count of each image into several groups. In addition, the multi-scale features are learned and extracted in the proposed SegCrowdNet to overcome the scale variations of the crowd. To verify the effectiveness of our proposed method, extensive experiments are conducted on four challenging datasets. The results demonstrate that our proposed SegCrowdNet achieves excellent performance compared with the state-of-the-art methods.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Spatio-Temporal-Frequency Graph Attention Convolutional Network for Aircraft Recognition Based on Heterogeneous Radar Network. (arXiv:2204.07360v1 [eess.SP])
    This paper proposes a knowledge-and-data-driven graph neural network-based collaboration learning model for reliable aircraft recognition in a heterogeneous radar network. The aircraft recognizability analysis shows that: (1) the semantic feature of an aircraft is motion patterns driven by the kinetic characteristics, and (2) the grammatical features contained in the radar cross-section (RCS) signals present spatial-temporal-frequency (STF) diversity decided by both the electromagnetic radiation shape and motion pattern of the aircraft. Then a STF graph attention convolutional network (STFGACN) is developed to distill semantic features from the RCS signals received by the heterogeneous radar network. Extensive experiment results verify that the STFGACN outperforms the baseline methods in terms of detection accuracy, and ablation experiments are carried out to further show that the expansion of the information dimension can gain considerable benefits to perform robustly in the low signal-to-noise ratio region.  ( 2 min )
    Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts. (arXiv:2204.07312v1 [math.OC])
    The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.  ( 2 min )
    Knowledgebra: An Algebraic Learning Framework for Knowledge Graph. (arXiv:2204.07328v1 [cs.LG])
    Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective.  ( 2 min )
    XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding. (arXiv:2204.07316v1 [cs.CL])
    Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU. After training with a small number of extra adapting steps and finetuned, the proposed XDBERT (cross-modal distilled BERT) outperforms pretrained-BERT in general language understanding evaluation (GLUE), situations with adversarial generations (SWAG) benchmarks, and readability benchmarks. We analyze the performance of XDBERT on GLUE to show that the improvement is likely visually grounded.  ( 2 min )
    Ensemble diverse hypotheses and knowledge distillation for unsupervised cross-subject adaptation. (arXiv:2204.07308v1 [cs.RO])
    Recognizing human locomotion intent and activities is important for controlling the wearable robots while walking in complex environments. However, human-robot interface signals are usually user-dependent, which causes that the classifier trained on source subjects performs poorly on new subjects. To address this issue, this paper designs the ensemble diverse hypotheses and knowledge distillation (EDHKD) method to realize unsupervised cross-subject adaptation. EDH mitigates the divergence between labeled data of source subjects and unlabeled data of target subjects to accurately classify the locomotion modes of target subjects without labeling data. Compared to previous domain adaptation methods based on the single learner, which may only learn a subset of features from input signals, EDH can learn diverse features by incorporating multiple diverse feature generators and thus increases the accuracy and decreases the variance of classifying target data, but it sacrifices the efficiency. To solve this problem, EDHKD (student) distills the knowledge from the EDH (teacher) to a single network to remain efficient and accurate. The performance of the EDHKD is theoretically proved and experimentally validated on a 2D moon dataset and two public human locomotion datasets. Experimental results show that the EDHKD outperforms all other methods. The EDHKD can classify target data with 96.9%, 94.4%, and 97.4% average accuracy on the above three datasets with a short computing time (1 ms). Compared to a benchmark (BM) method, the EDHKD increases 1.3% and 7.1% average accuracy for classifying the locomotion modes of target subjects. The EDHKD also stabilizes the learning curves. Therefore, the EDHKD is significant for increasing the generalization ability and efficiency of the human intent prediction and human activity recognition system, which will improve human-robot interactions.  ( 2 min )
    Methodical Advice Collection and Reuse in Deep Reinforcement Learning. (arXiv:2204.07254v1 [cs.LG])
    Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.  ( 2 min )
    A Differentially Private Probabilistic Framework for Modeling the Variability Across Federated Datasets of Heterogeneous Multi-View Observations. (arXiv:2204.07352v1 [cs.LG])
    We propose a novel federated learning paradigm to model data variability among heterogeneous clients in multi-centric studies. Our method is expressed through a hierarchical Bayesian latent variable model, where client-specific parameters are assumed to be realization from a global distribution at the master level, which is in turn estimated to account for data bias and variability across clients. We show that our framework can be effectively optimized through expectation maximization (EM) over latent master's distribution and clients' parameters. We also introduce formal differential privacy (DP) guarantees compatibly with our EM optimization scheme. We tested our method on the analysis of multi-modal medical imaging data and clinical scores from distributed clinical datasets of patients affected by Alzheimer's disease. We demonstrate that our method is robust when data is distributed either in iid and non-iid manners, even when local parameters perturbation is included to provide DP guarantees. Moreover, the variability of data, views and centers can be quantified in an interpretable manner, while guaranteeing high-quality data reconstruction as compared to state-of-the-art autoencoding models and federated learning schemes. The code is available at https://gitlab.inria.fr/epione/federated-multi-views-ppca.  ( 2 min )
    Crowd counting with crowd attention convolutional neural network. (arXiv:2204.07347v1 [cs.CV])
    Crowd counting is a challenging problem due to the scene complexity and scale variation. Although deep learning has achieved great improvement in crowd counting, scene complexity affects the judgement of these methods and they usually regard some objects as people mistakenly; causing potentially enormous errors in the crowd counting result. To address the problem, we propose a novel end-to-end model called Crowd Attention Convolutional Neural Network (CAT-CNN). Our CAT-CNN can adaptively assess the importance of a human head at each pixel location by automatically encoding a confidence map. With the guidance of the confidence map, the position of human head in estimated density map gets more attention to encode the final density map, which can avoid enormous misjudgements effectively. The crowd count can be obtained by integrating the final density map. To encode a highly refined density map, the total crowd count of each image is classified in a designed classification task and we first explicitly map the prior of the population-level category to feature maps. To verify the efficiency of our proposed method, extensive experiments are conducted on three highly challenging datasets. Results establish the superiority of our method over many state-of-the-art methods.  ( 2 min )
    Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities. (arXiv:2204.07321v1 [cs.LG])
    Graph neural networks have emerged as a leading architecture for many graph-level tasks such as graph classification and graph generation with a notable improvement. Among these tasks, graph pooling is an essential component of graph neural network architectures for obtaining a holistic graph-level representation of the entire graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these methods. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods on graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods and provide a mathematical summary for each category; 2) next, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) then, we further outline in brief the applications that incorporate the idea of graph pooling in a number of domains; 4) and finally, we discuss some critical challenges faced by the current studies and share our insights on potential directions for improving graph pooling in the future.  ( 2 min )
    Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning. (arXiv:2204.07373v1 [cs.RO])
    Adversarial training (i.e., training on adversarially perturbed input data) is a well-studied method for making neural networks robust to potential adversarial attacks during inference. However, the improved robustness does not come for free but rather is accompanied by a decrease in overall model accuracy and performance. Recent work has shown that, in practical robot learning applications, the effects of adversarial training do not pose a fair trade-off but inflict a net loss when measured in holistic robot performance. This work revisits the robustness-accuracy trade-off in robot learning by systematically analyzing if recent advances in robust training methods and theory in conjunction with adversarial robot learning can make adversarial training suitable for real-world robot applications. We evaluate a wide variety of robot learning tasks ranging from autonomous driving in a high-fidelity environment amenable to sim-to-real deployment, to mobile robot gesture recognition. Our results demonstrate that, while these techniques make incremental improvements on the trade-off on a relative scale, the negative side-effects caused by adversarial training still outweigh the improvements by an order of magnitude. We conclude that more substantial advances in robust learning methods are necessary before they can benefit robot learning tasks in practice.  ( 2 min )
    Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference. (arXiv:2204.07305v1 [cs.CV])
    Few-shot learning (FSL) is an important and topical problem in computer vision that has motivated extensive research into numerous methods spanning from sophisticated meta-learning methods to simple transfer learning baselines. We seek to push the limits of a simple-but-effective pipeline for more realistic and practical settings of few-shot image classification. To this end, we explore few-shot learning from the perspective of neural network architecture, as well as a three stage pipeline of network updates under different data supplies, where unsupervised external data is considered for pre-training, base categories are used to simulate few-shot tasks for meta-training, and the scarcely labelled data of an novel task is taken for fine-tuning. We investigate questions such as: (1) How pre-training on external data benefits FSL? (2) How state-of-the-art transformer architectures can be exploited? and (3) How fine-tuning mitigates domain shift? Ultimately, we show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks such as Mini-ImageNet, CIFAR-FS, CDFSL and Meta-Dataset. Our code and demo are available at https://hushell.github.io/pmf.  ( 2 min )
    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.  ( 2 min )
    Unsupervised Probabilistic Models for Sequential Electronic Health Records. (arXiv:2204.07292v1 [cs.LG])
    We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models. (arXiv:2204.07288v1 [cs.CL])
    With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider the trade-offs between accuracy, speed, and power consumption as input sizes or model sizes are varied. In this work, we perform a systematic study of this accuracy vs. efficiency trade-off on two widely used long-sequence models - Longformer-Encoder-Decoder (LED) and Big Bird - during fine-tuning and inference on four datasets from the SCROLLS benchmark. To study how this trade-off differs across hyperparameter settings, we compare the models across four sequence lengths (1024, 2048, 3072, 4096) and two model sizes (base and large) under a fixed resource budget. We find that LED consistently achieves better accuracy at lower energy costs than Big Bird. For summarization, we find that increasing model size is more energy efficient than increasing sequence length for higher accuracy. However, this comes at the cost of a large drop in inference speed. For question answering, we find that smaller models are both more efficient and more accurate due to the larger training batch sizes possible under a fixed resource budget.  ( 2 min )
    Active Learning for Regression and Classification by Inverse Distance Weighting. (arXiv:2204.07177v1 [cs.LG])
    This paper proposes an active learning algorithm for solving regression and classification problems based on inverse-distance weighting functions for selecting the feature vectors to query. The algorithm has the following features: (i) supports both pool-based and population-based sampling; (ii) is independent of the type of predictor used; (iii) can handle known and unknown constraints on the queryable feature vectors; and (iv) can run either sequentially, or in batch mode, depending on how often the predictor is retrained. The method's potential is shown in numerical tests on illustrative synthetic problems and real-world regression and classification datasets from the UCI repository. A Python implementation of the algorithm that we call IDEAL (Inverse-Distance based Exploration for Active Learning), is available at \url{this http URL}.  ( 2 min )
    Hierarchical Embedded Bayesian Additive Regression Trees. (arXiv:2204.07207v1 [stat.ME])
    We propose a simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HE-BART). The model allows for random effects to be included at the terminal node level of a set of regression trees, making HE-BART a non-parametric alternative to mixed effects models which avoids the need for the user to specify the structure of the random effects in the model, whilst maintaining the prediction and uncertainty calibration properties of standard BART. Using simulated and real-world examples, we demonstrate that this new extension yields superior predictions for many of the standard mixed effects models' example data sets, and yet still provides consistent estimates of the random effect variances. In a future version of this paper, we outline its use in larger, more advanced data sets and structures.  ( 2 min )
    Spatio-Temporal Analysis of Transformer based Architecture for Attention Estimation from EEG. (arXiv:2204.07162v1 [q-bio.NC])
    For many years now, understanding the brain mechanism has been a great research subject in many different fields. Brain signal processing and especially electroencephalogram (EEG) has recently known a growing interest both in academia and industry. One of the main examples is the increasing number of Brain-Computer Interfaces (BCI) aiming to link brains and computers. In this paper, we present a novel framework allowing us to retrieve the attention state, i.e degree of attention given to a specific task, from EEG signals. While previous methods often consider the spatial relationship in EEG through electrodes and process them in recurrent or convolutional based architecture, we propose here to also exploit the spatial and temporal information with a transformer-based network that has already shown its supremacy in many machine-learning (ML) related studies, e.g. machine translation. In addition to this novel architecture, an extensive study on the feature extraction methods, frequential bands and temporal windows length has also been carried out. The proposed network has been trained and validated on two public datasets and achieves higher results compared to state-of-the-art models. As well as proposing better results, the framework could be used in real applications, e.g. Attention Deficit Hyperactivity Disorder (ADHD) symptoms or vigilance during a driving assessment.  ( 2 min )
    Testing distributional assumptions of learning algorithms. (arXiv:2204.07196v1 [cs.LG])
    There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.  ( 2 min )
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.  ( 2 min )
    Robotic and Generative Adversarial Attacks in Offline Writer-independent Signature Verification. (arXiv:2204.07246v1 [cs.RO])
    This study explores how robots and generative approaches can be used to mount successful false-acceptance adversarial attacks on signature verification systems. Initially, a convolutional neural network topology and data augmentation strategy are explored and tuned, producing an 87.12% accurate model for the verification of 2,640 human signatures. Two robots are then tasked with forging 50 signatures, where 25 are used for the verification attack, and the remaining 25 are used for tuning of the model to defend against them. Adversarial attacks on the system show that there exists an information security risk; the Line-us robotic arm can fool the system 24% of the time and the iDraw 2.0 robot 32% of the time. A conditional GAN finds similar success, with around 30% forged signatures misclassified as genuine. Following fine-tune transfer learning of robotic and generative data, adversarial attacks are reduced below the model threshold by both robots and the GAN. It is observed that tuning the model reduces the risk of attack by robots to 8% and 12%, and that conditional generative adversarial attacks can be reduced to 4% when 25 images are presented and 5% when 1000 images are presented.  ( 2 min )
    The training response law explains how deep neural networks learn. (arXiv:2204.07291v1 [cond-mat.dis-nn])
    Deep neural network is the widely applied technology in this decade. In spite of the fruitful applications, the mechanism behind that is still to be elucidated. We study the learning process with a very simple supervised learning encoding problem. As a result, we found a simple law, in the training response, which describes neural tangent kernel. The response consists of a power law like decay multiplied by a simple response kernel. We can construct a simple mean-field dynamical model with the law, which explains how the network learns. In the learning, the input space is split into sub-spaces along competition between the kernels. With the iterated splits and the aging, the network gets more complexity, but finally loses its plasticity.  ( 2 min )
    Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks. (arXiv:2204.07261v1 [cs.LG])
    We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.  ( 2 min )
    Learning two-phase microstructure evolution using neural operators and autoencoder architectures. (arXiv:2204.07230v1 [cond-mat.mtrl-sci])
    Phase-field modeling is an effective mesoscale method for capturing the evolution dynamics of materials, e.g., in spinodal decomposition of a two-phase mixture. However, the accuracy of high-fidelity phase field models comes at a substantial computational cost. Hence, fast and generalizable surrogate models are needed to alleviate the cost in computationally taxing processes such as in optimization and design of materials. The intrinsic discontinuous nature of the physical phenomena incurred by the presence of sharp phase boundaries makes the training of the surrogate model cumbersome. We develop a new framework that integrates a convolutional autoencoder architecture with a deep neural operator (DeepONet) to learn the dynamic evolution of a two-phase mixture. We utilize the convolutional autoencoder to provide a compact representation of the microstructure data in a low-dimensional latent space. DeepONet, which consists of two sub-networks, one for encoding the input function at a fixed number of sensors locations (branch net) and another for encoding the locations for the output functions (trunk net), learns the mesoscale dynamics of the microstructure evolution in the latent space. The decoder part of the convolutional autoencoder can then reconstruct the time-evolved microstructure from the DeepONet predictions. The result is an efficient and accurate accelerated phase-field framework that outperforms other neural-network-based approaches while at the same time being robust to noisy inputs.  ( 2 min )
    Minimizing Control for Credit Assignment with Strong Feedback. (arXiv:2204.07249v1 [cs.NE])
    The success of deep learning attracted interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neuroscience showing that top-down feedback can significantly influence neural activity. Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. Instead of gradually changing the network weights towards configurations with low output loss, weight updates gradually minimize the amount of feedback required from a controller that drives the network to the supervised output label. Moreover, we show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time. We complement our theoretical results with experiments on standard computer-vision benchmarks, showing competitive performance to backpropagation as well as robustness to noise. Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions.  ( 2 min )
    Brazilian Court Documents Clustered by Similarity Together Using Natural Language Processing Approaches with Transformers. (arXiv:2204.07182v1 [cs.AI])
    Recent advances in Artificial intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on transformers, namely BERT, GPT-2 and RoBERTa pre-trained in the Brazilian Portuguese language and the same specialized using 210,000 legal proceedings. Documents were pre-processed and had their content transformed into a vector representation using these NLP techniques. Unsupervised learning was used to cluster the lawsuits, calculating the quality of the model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers present better performance when compared to previous research, highlighting the RoBERTa model specialized in the Brazilian Portuguese language, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.  ( 2 min )
    Relaxing Equivariance Constraints with Non-stationary Continuous Filters. (arXiv:2204.07178v1 [cs.LG])
    Equivariances provide useful inductive biases in neural network modeling, with the translation equivariance of convolutional neural networks being a canonical example. Equivariances can be embedded in architectures through weight-sharing and place symmetry constraints on the functions a neural network can represent. The type of symmetry is typically fixed and has to be chosen in advance. Although some tasks are inherently equivariant, many tasks do not strictly follow such symmetries. In such cases, equivariance constraints can be overly restrictive. In this work, we propose a parameter-efficient relaxation of equivariance that can effectively interpolate between a (i) non-equivariant linear product, (ii) a strict-equivariant convolution, and (iii) a strictly-invariant mapping. The proposed parameterization can be thought of as a building block to allow adjustable symmetry structure in neural networks. Compared to non-equivariant or strict-equivariant baselines, we experimentally verify that soft equivariance leads to improved performance in terms of test accuracy on CIFAR-10 and CIFAR-100 image classification tasks.  ( 2 min )
    Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition. (arXiv:2204.07208v1 [cs.LG])
    CP decomposition (CPD) is prevalent in chemometrics, signal processing, data mining and many more fields. While many algorithms have been proposed to compute the CPD, alternating least squares (ALS) remains one of the most widely used algorithm for computing the decomposition. Recent works have introduced the notion of eigenvalues and singular values of a tensor and explored applications of eigenvectors and singular vectors in areas like signal processing, data analytics and in various other fields. We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work. Computing these critical points in an alternating manner motivates an alternating optimization algorithm which corresponds to alternating least squares algorithm in the matrix case. However, for tensors with order greater than equal to $3$, it minimizes an objective function which is different from the commonly used least squares loss. Alternating optimization of this new objective leads to simple updates to the factor matrices with the same asymptotic computational cost as ALS. We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank and verify it experimentally. We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors. This perspective allows us to generalize our approach to interpolate between updates corresponding to the ALS and the new algorithm to manage the tradeoff between stability and fitness of the decomposition. Our experimental results show that for approximating synthetic and real-world tensors, this algorithm and its variants converge to a better conditioned decomposition with comparable and sometimes better fitness as compared to the ALS algorithm.  ( 2 min )
    Harnessing Interpretable Machine Learning for Origami Feature Design and Pattern Selection. (arXiv:2204.07235v1 [cond-mat.soft])
    Engineering design of origami systems is challenging because comparing different origami patterns requires using categorical features and evaluating multi-physics behavior targets introduces multi-objective problems. This work shows that a decision tree machine learning method is particularly suitable for the inverse design of origami. This interpretable machine learning method can reveal complex interactions between categorical features and continuous features for comparing different origami patterns, can tackle multi-objective problems for designing active origami with multi-physics performance targets, and can extend existing origami shape fitting algorithms to further consider non-geometrical performances of origami systems. The proposed framework shows a holistic way of designing active origami systems for various applications such as metamaterials, deployable structures, soft robots, biomedical devices, and many more.  ( 2 min )
    Physics-Aware Recurrent Convolutional (PARC) Neural Networks to Assimilate Meso-scale Reactive Mechanics of Energetic Materials. (arXiv:2204.07234v1 [cond-mat.mtrl-sci])
    The thermomechanical properties of energetic materials (EM) are known to be a function of their microscopic structures, i.e., morphological configurations of crystals and pores. This microstructural dependency has motivated vigorous research in the EM community, seeking to engineer material microstructures with targeted properties and performance under the materials-by-design paradigm. However, establishing the complex structure-property-performance (SPP) relationships of EMs demands extensive experimental and simulation efforts, and assimilating and encapsulating these relationships in usable models is a challenge. Here, we present a novel deep learning method, Physics-Aware Recurrent Convolutional (PARC) Neural Network, that can "learn" the mesoscale thermo-mechanics of EM microstructures during the shock-to-detonation transition (SDT). We show that this new approach can produce accurate high-fidelity predictions of time-evolving temperature and pressure fields of the same quality as the state-of-the-art direct numerical simulations (DNS), despite the dramatic reduction of computing time, from hours and days on a high-performance computing cluster (HPC) to a little more than a second on a commodity laptop. We also demonstrate that PARC can provide physical insights, i.e., the artificial neurons can illuminate the underlying physics by identifying which microstructural features led to critical hotspots and what are the characteristics of "critical" versus "non-critical" microstructures. This new knowledge generated alongside the capacity to conduct high-throughput experiments will broaden our theoretical understanding of the initiation mechanisms of EM detonation, as a step towards engineering EMs with specific properties.  ( 2 min )
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.  ( 2 min )
  • Open

    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.  ( 2 min )
    A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation. (arXiv:2204.00036v2 [stat.ME] UPDATED)
    One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.  ( 2 min )
    Bayesian Nonparametrics for Sparse Dynamic Networks. (arXiv:1607.01624v2 [stat.ML] UPDATED)
    In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying generalised gamma process. We provide some theoretical insights into the model and apply it to three datasets: a simulated network, a network of hyperlinks between communities on Reddit, and a network of co-occurences of words in Reuters news articles after the September 11th attacks.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Warped Dynamic Linear Models for Time Series of Counts. (arXiv:2110.14790v2 [stat.ME] UPDATED)
    Dynamic Linear Models (DLMs) are commonly employed for time series analysis due to their versatile structure, simple recursive updating, ability to handle missing data, and probabilistic forecasting. However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility. We introduce a novel semiparametric methodology for count time series by warping a Gaussian DLM. The warping function has two components: a (nonparametric) transformation operator that provides distributional flexibility and a rounding operator that ensures the correct support for the discrete data-generating process. We develop conjugate inference for the warped DLM, which enables analytic and recursive updates for the state space filtering and smoothing distributions. We leverage these results to produce customized and efficient algorithms for inference and forecasting, including Monte Carlo simulation for offline analysis and an optimal particle filter for online inference. This framework unifies and extends a variety of discrete time series models and is valid for natural counts, rounded values, and multivariate observations. Simulation studies illustrate the excellent forecasting capabilities of the warped DLM. The proposed approach is applied to a multivariate time series of daily overdose counts and demonstrates both modeling and computational successes.  ( 2 min )
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.  ( 2 min )
    Distributed Reconstruction of Noisy Pooled Data. (arXiv:2204.07491v1 [cs.IT])
    In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.  ( 2 min )

  • Open

    [R][P] Mask Transfiner for High-Quality Instance Segmentation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [P] Spoonfy: Turn any foreign-language video into effective listening practice
    Video (Despacito, slightly NSFW): https://drive.google.com/file/d/12qYKv_yaqGr9GWvHPtE9ng2foJVPfpoE/view?usp=sharing Code & more info: https://github.com/athairus/SpoonfyDemo Discord: https://discord.gg/7wcZZzeSQk Spoonfy is essentially the so-called Telenovela method (learning languages through subtitled video) on steroids: This demo uses a finetuned Facebook's M2M100 model to translate Spanish to English (finetuned to do literal translation instead of ordinary translation) and a wav2vec2 model to get Spanish word timings to present the literal (aka word-by-word) translations as karaoke-style lyrics. What sets Spoonfy apart from other solutions is the way it leverages the massive body of existing subtitled content out there to create learning material. Also because it's FOSS. More details in the code's README. I've been working on this project by myself for a few months now, I hope you see potential behind it like I do! If so (but also if not), I'd love to hear what you think. And I'd love to get your help improving on what I already built. I have plenty of ideas for how to make the translations even more accurate, the system more robust & able to handle more sources of content (YouTube, TikTok, Blu-Rays), etc. Thanks for checking it out! submitted by /u/athairus [link] [comments]  ( 1 min )
    [D] Current work on knowledge representation with your preference, and use of language models
    In robotics and autonomous systems, knowledge representation is an important aspect. What is your favorite methods for knowledge representation, is it Formal logic or graphs or whatever and why you like that kind of representation. Considering the success of large language model isn't it a good time to use them in new kind of representation, so that robots or similar system can make better decisions in an environment. I still feel there is no common consensus in community for correct way of knowledge representation, correct me If I am wrong. submitted by /u/projekt_treadstone [link] [comments]  ( 1 min )
    [D] DALL-E 2 vs Disco Diffusion - SHOWDOWN!
    submitted by /u/nin_artificial [link] [comments]  ( 1 min )
    WACV vs. BMVC [R]
    How do they compare in terms of the communities, prestige, competitiveness, and impact. I have a paper accepted to a CVPR workshop and considering extending it and submitting to one of these. The work is based on explainability in medical vision. It's more methods-oriented rather than large-scale experiments. What are your suggestions? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [N] [P] Access 100+ image, video & audio datasets in seconds with one line of code & stream them while training ML models with Activeloop Hub (more at docs.activeloop.ai, description & links in the comments below)
    submitted by /u/davidbun [link] [comments]  ( 5 min )
    [D] Is it ok to promise a dataset in your paper, get published and then not release it?
    Recently, I decided to explore NeRF and found a very interesting dataset in the NeRS paper of 3D models, which was published in NeurIPS 2021 four months ago. Authors promised to release their dataset: The filtered dataset with anonymized personally identifiable information (e.g. license plates and phone numbers), masks, initial camera poses, and optimized NeRS cameras will be made available on the project page. However, if you check their project page or github repo — there is nothing there. I do not have much experience in machine learning, but wonder whether it's ok to do this? My thinking was that it is something to look down upon, but in this case it is done by Carnegie Mellon University (which is a top-tier one in ML?) on a top-tier conference (NeurIPS 2021). So I assume it's fine? submitted by /u/throwmeaway-account [link] [comments]  ( 5 min )
    [D] Wasserstein distance lipschitz vs gaussian distribution
    Hi, I heard there are different ways to calculate Wasserstein distance in Neural network context. First, We can convert 1-d Wasserstein loss to dual representation and constraint it's size(to makes lipschitz function). We need to do weight clip to make our model a lipschitz function. Second, we can make neural network output as Gaussian distribution and calculate easy form using neural network output as mean and covariance matrix. So, what are the advantages and disadvantages of comparing them? It may sound ambiguous, but I have not seen a study that compares the two about representation quality, computation, etc... Thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] What do you use to make your blog/personal websites?
    I've noticed a lot of folks in ML have a personal website that doubles as a blog to write about their work/projects. As someone looking to build their own website along the same lines, I'm looking for frameworks to try and build it with. What framework do you use to design your site? submitted by /u/SwiftLynx [link] [comments]  ( 3 min )
    [Discussion] Interpretable Neural Network ... ?
    Hi All! I've been working on a linear method that extracts signals from images by learning a set of composable image filters. It can recompose an image using these filters as seen on this biological histology tissue (real on the right, recomposed on the left) ​ ​ https://preview.redd.it/czgdk6edx0u81.png?width=768&format=png&auto=webp&s=8768f93a749fff7dc41e576a74403096e113942e ​ Because it is a linear method that learns image filters - I had an idea: what if some components of a neural network could be replaced with a learnable set of filters? For those not in the know, image filters are similar to masks that upweight some parts of the image, and downweights other parts - similar to a highlighter to select text and a pen to cross out words. I show how in the figure below: ​ ​ https://preview.redd.it/2azco14ex0u81.jpg?width=499&format=pjpg&auto=webp&s=f3795fec06daa13da61ec155159a0ad865524530 ​ Learning a set of image filters with a neural network is a good idea, as neural networks are much more flexible and are considered to be "universal function approximations". So I wrote up a Pytorch package to pass the neural network feature weights from Convolutions and Max Pooling into the linear method to learn a relevant set of filters - results are comparable even on CIFAR10. The caveat is that there is no ReLU, no other activation functions, and no Dropout - only 1 main single linear layer that learns filters... an interpretable neural network! ​ Results are all here (including ipynb comparing with base CNN and VGG16) https://github.com/AskExplain/Interpretable-Neural-Net ​ I'll update the GitHub with some figures of why the single layer is interpretable soon ... ​ In the meantime - discuss! submitted by /u/TryToExplainHow [link] [comments]  ( 3 min )
    [P] New Graph Data Augmentation Library
    Hello! I recently built grafog, a graph data augmentation library on top of PyTorch Geometric. You can chain together graph mentations as done in albumentations or torchvision.transforms. Check it out: https://github.com/rish-16/grafog It has the following augmentations: Random Node Drop Random Edge Drop Normalize Features MixUp Strategy Node Feature Masking Edge Feature Masking https://preview.redd.it/c53r7gkrk0u81.png?width=689&format=png&auto=webp&s=8fbe668e82571a7fe5de9ebb5e4690dbd34032bb https://preview.redd.it/5zrj4gkrk0u81.png?width=863&format=png&auto=webp&s=5bd02ea4adaf86b8911fa89372be9f05f9010536 Happy augmenting! submitted by /u/rish-16 [link] [comments]
    [N]: How does OpenAI's DALL-E 2 work?
    submitted by /u/giugiacaglia [link] [comments]
    [D] What is the opposite of an ablative study?
    I've the feeling that this question may be really stupid but I make it anyway. In ML we often see ablative studies. How is the opposite of it called? In other words: A study that aim to improve a model, and once an improvement is reached, this new model is taken as basis for further investigations? submitted by /u/Rogitus [link] [comments]  ( 2 min )
  • Open

    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    AI Trippy Dream 19 - Exploring a Colorful Maze
    submitted by /u/LordPewPew777 [link] [comments]
    a better Boids simulation: An artificial life simulation of the flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    New AI upscaler tool
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    credit scoring for companies
    Hello everyone I'm newbie so pardon me if you find that my question is stupid. I'm working on a project here's it's description in a nutshell ( Classifying companies if they're going to bankrupt or not and based of the probability of default ( probability of bankruptcy) give each companies a score For example 88 percent to bankrupt score is D 21 percent to bankrupt score is B 3 percent to bankrupt score is a) My question is what kind of models should test ? Should i go for machine learning algorithms such as logistic regression, knn, SVM? Should I go for neural networks ANN? Or can I use deep learning models like MLP... probabilistic Neural Network? Any guidance or advice will be appreciated and thanks a lot. submitted by /u/YeccAnon4 [link] [comments]  ( 1 min )
  • Open

    LSTM for time series prediction
    Hi I am doing a project where I have to predict sales for a company and I am having some trouble with my LSTM model in python. All research I have done tells me that LSTM is as good, if not better, than a ARIMA model for forecasting on time series data, but my LSTM is significantly worse than my ARIMA model. Would it be possible for any one to help me to see if I have implemented it right? I have used both Tensorflow and Pytorch and both are way worse than the ARIMA model. submitted by /u/magnussendjoko [link] [comments]  ( 2 min )
  • Open

    Learning style of play (different agents' actions) in the same offline RL environment?
    Hi, everyone. I'm a relative novice in RL, so bear with me as I try to formulate my question. I'm working on a chess bot that can play moves like a player (imitate their style of play) that is chosen from a set of players (that the bot is trained on) , if I give the bot the previous x moves. Using more technical terms, I'm trying to create an agent that is given a sequence of states-actions of another agent (player) and some representation of who that agent (player) is, and predict the next action (continue playing in the style of that player). I'm fairly certain this is an RL problem, as I don't know how to frame it as a supervised learning problem (I might be wrong). I've seen some papers that abstract offline RL as a sequence modeling problem (Decision Transformer, Trajectory Transformer), so I'm fairly certain I should continue in a similar manner. But I'm having a hard time trying to understand how to treat the difference in players. My instinct was to use some representation of the player as the reward, but then how would I even optimize for it or even give it as an input? Do I just add the player as a feature in the game state, but then what should be the reward? Has this been done before, or something similar? I couldn't really find any paper or code that worked on differentiating the training data by who made it (I might not be wording it correctly). submitted by /u/OverhypeUnderdeliver [link] [comments]  ( 3 min )
  • Open

    Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience
    Data has become a huge area of business, helping businesses to drive their intelligence, make better decisions, and formulate strategic plans for future growth. The post Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience appeared first on Data Science Central.  ( 7 min )
    ML classifies gravitational-wave glitches with high accuracy
    The LIGO observatory can detect astronomical events from billions of light years away. Terabytes of complex daily data makes human analysis impossible. New study applies neural network with up to 97% classification accuracy. Caltech/MIT’s LIGO, the largest gravitational-wave observatory in the world, collects data on minute space-time ripples from cataclysmic astronomical events like colliding black… Read More »ML classifies gravitational-wave glitches with high accuracy The post ML classifies gravitational-wave glitches with high accuracy appeared first on Data Science Central.  ( 4 min )
    Zero Trust Principles: What is Zero Trust Model?
    The central principle of the Zero Trust model is based on the authentication and verification of every device connecting to the network before they are trusted. Former Forrester analyst and veteran of the high-technology world, John Kindervag, who has been actively part of a wide array of network technology projects, coined the term “Zero Trust”… Read More »Zero Trust Principles: What is Zero Trust Model? The post Zero Trust Principles: What is Zero Trust Model? appeared first on Data Science Central.  ( 4 min )

  • Open

    [R] Questions about ACL Rolling Review
    A few questions about ACL ARR: - If you request to reassign a reviewer, would the editor aim for reassigning all three reviewers or he would go for reassigning only that particular reviewer? Assume you have given a valid reason for reassignment and the editor is convinced. - If you request to reassign a reviewer, can the new reviewer see the previous reviews/scores before submitting his own review? Or he would access to the previous revision after submitting his own review. I already know (have heard) that in many cases reviewers are not available, and it becomes inevitable to get an entirely new set of reviews. I already know this. But my questions are about the case that the reviewer availability is not an issue. Juts trying to find out how things are managed submitted by /u/sim_inf [link] [comments]  ( 1 min )
    [R][P] MultiMAE: Multi-modal Multi-task Masked Autoencoders + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [Discussion]Is it possible to find a SWE job with a DS master degree? Or would it be possible to make the transtion later on?
    Say that a masters student graduates in a DS program with heavy focus on data and CS (so knows the basics of CS like data structures and programming, and has also studied courses like data mining, big data analytics, and machine learning), what are their possible job openings and relatively easy positions to get into? My understanding is really rudimentaly, feel free to correct me pls: Data scientist, this should be the most fitting and easy-to-get-interview position. Difficulty level 1/5. Data analyist, same as 1. Difficulty level 1/5. Data engineer, same as 1. Difficulty level 1/5. Machine learning engineer, has a higher bar than 1, 2, and 3, and it's very difficult to get interviews without proper background and work experience. So it's very difficult to become one for a masters graduate in DS, but it's quite possible for DS/DE (but much less so for DA) to make into MLE positions. Difficulty level 3/5. Software engineer, it's totally another realm, and has very few skill overlap with 1, 2, and 3. So it's very hard to make the transition or land a SWE job for DS students. Difficulty level 5/5. submitted by /u/Competitive_Map_935 [link] [comments]  ( 1 min )
    [Project] Open-source playground to generate images from text using DALL-E Mini
    submitted by /u/koryoislie [link] [comments]
    [D] Incorporating node features into GNNs?
    Hey all, I am looking to learn more about how to incorporate node features with their embeddings for training. Specifically, I am working with gene-gene interaction networks, and also want to include RNA-sequencing quantifications. If anyone has a good introductory resource so I can familiarize myself with the process, I would really appreciate it! submitted by /u/PM_ME_A_ONELINER [link] [comments]  ( 1 min )
    [D] Moderation uniformity in subreddit
    This isn't meant to be a rant. Rather far from it. Yesterday I posted a legitimate question about databases choices in r/MachineLearning. This was about what technical choices ML members are currently using for large scale data ingestion in a continual learning environment. The post was removed. That post was neither a (1) beginner nor (2) offensive and (3) aimed to be a constructive discussion suitable as mid-range ML query and (4) marked with appropriate flair I finally posted it elsewhere. Yet today I see questions about transitions between DS -> MLE and quirky labgroup names which can be used from ML terms. These aren't even research questions. https://www.reddit.com/r/MachineLearning/comments/u503vz/d_is_it_easier_to_transition_to_mle_as_a_ds_or_swe/ https://www.reddit.com/r/MachineLearning/comments/u5091o/d_do_you_know_any_funny_team_names_with/ How is this fair moderation genuinely? How can we improve the noise filter or do better moderation? submitted by /u/mlbloke [link] [comments]  ( 2 min )
    [D] Counterfactual Fairness
    So I watched this old video by Microsoft Research: https://www.youtube.com/watch?v=psA4U6nhZ70 To summarize, it uses the fairness criteria that sensitive attribute A will give the same prediction regardless of the value, when using counterfactuals. That is, if you're male or female, it shouldn't influence the models predictions. The idea seems decent at first glance. But what if the "bias" or "unfairness" that the model creates based on sensitive attribute A isn't caused by a dataset bias but rather detects a real signal in the data? The model proposed by Microsoft Research doesn't take into consideration that the prediction on the sensitive attribute A does not necessarily consist of ONLY unfairness. They simply define it as such. Is such an algorithmic design choice not exactly one of the flaws that we seek to eliminate? Assuming, that not all of the imbalance in predictions by the model on the sensitive attribute A is caused by "unfairness" but that some of it is caused by an inherent difference, then are they not introducing direct human bias and unfairness into their model by explicitedly designing the system to fit their own human (and political) bias? Don't get me wrong; the opposite is just as bad. Assuming that ALL of the imbalance in the prediction on the sensitive attribute A is caused by "inherent differences" is just as bad. Do you know of anyone that has tackled this in a good manner? How would you even begin to estimate how much is due to an "inherent difference" and how much is due to "bias, unfairness, noise" (or otherwise)? submitted by /u/caahel [link] [comments]  ( 4 min )
    [D] Paper Explained - Transformer Memory as a Differentiable Search Index (Full Video Walkthrough)
    https://youtu.be/qlB0TPBQ7YY Search engines work by building an index and then looking up things in it. Usually, that index is a separate data structure. In keyword search, we build and store reverse indices. In neural search, we build nearest-neighbor indices. This paper does something different: It directly trains a Transformer to return the ID of the most relevant document. No similarity search over embeddings or anything like this is performed, and no external data structure is needed, as the entire index is essentially captured by the model's weights. The paper experiments with various ways of representing documents and training the system, which works surprisingly well! OUTLINE: 0:00 - Intro 0:45 - Sponsor: Diffgram 1:35 - Paper overview 3:15 - The search problem, classic and neural 8:15 - Seq2seq for directly predicting document IDs 11:05 - Differentiable search index architecture 18:05 - Indexing 25:15 - Retrieval and document representation 33:25 - Training DSI 39:15 - Experimental results 49:25 - Comments & Conclusions ​ Paper: https://arxiv.org/abs/2202.06991 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [R] Useful method to train models for adversarial robustness
    submitted by /u/IncredibleMac [link] [comments]
    [P] Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 1 min )
    [D] Spotify's Podcast Search Explained
    I wrote this article breaking down how Spotify have applied semantic search to enhance podcast discovery. I find it super interesting to see the approach Spotify have used in terms of data sources, model fine-tuning, and vector search - and wanted to show how to almost replicate it. Let me know if you have any thoughts on their approach! submitted by /u/jamescalam [link] [comments]
    [R] Machine learning in management of precautionary closures caused by lipophilic biotoxins
    In this work, we have covered a deep study of alternatives in order to improve the aquaculture of mussels with very noisy and unbalanced data https://www.sciencedirect.com/science/article/pii/S0168169922002733 submitted by /u/ennanco [link] [comments]
    [P] RR-GCN now supports multi-modal learning!
    We have just released v0.0.2 of our RR-GCN. This release includes support for multi-modal learning. Node embeddings can now be initialised with literal information or pre-trained embeddings for text and image data. Go check out our notebooks that show how we can achieve state-of-the-art performance on several benchmark datasets in less than one minute. Moreover, and more importantly, the representations produced by our RR-GCN are unsupervised and parameter-free (i.e. no training is required), making it possible to re-use them for multiple downstream ML tasks with high predictive performances. ​ https://github.com/predict-idlab/RR-GCN submitted by /u/givdwiel [link] [comments]  ( 1 min )
    [D] Paper Explained – SEER explained: Vision Models more Robust & Fair when pretrained on UNCURATED images!?
    https://youtu.be/XHAoV_nKr1o This video explains the 10 billion parameter SEER model from MetaAI by Goyal et al. 2022. Paper link: https://arxiv.org/abs/2202.08360 Official implementation: https://github.com/facebookresearch/vissl/tree/main/projects/SEER Short description: The 10 billion parameter SEER model from u/MetaAI is *fairer*, even though it is trained on *uncurated* data. How so? Check out our take on this. Outline: 00:00 Training on uncurated data 01:12 Diffgram (Sponsor) 01:46 Toxicity in large models 02:43 What to do against model toxicity? 03:53 SEER model explained 06:52 SEER is fairer. But how? submitted by /u/AICoffeeBreak [link] [comments]  ( 1 min )
  • Open

    Is there an AI I could use to create an artificial Terence McKenna chatbot?
    He’s basically this wacky dead philosopher with 1000s of hours of his lectures of YT and I was thinking it may be possible to create an artificial AI personality of his from all of his recorded speech? Would there be a simple enough program I could download or anything of the sort? submitted by /u/Vaporshots [link] [comments]  ( 1 min )
    Boids: An artificial life simulation of a flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    AI Trippy Dream 37 - Psychedelic Special Request
    submitted by /u/LordPewPew777 [link] [comments]
    Amazing Generation
    Looks Amazing. The vibe is there. What do you think ? How did he archive this ? Created by Hand or through artificial ? https://www.tiktok.com/@ai.metascape/video/7086451191151971586 submitted by /u/PillowG1rl [link] [comments]
    Little Baby Chibi Lucy Loud
    submitted by /u/VIRUS-AOTOXIN [link] [comments]
    Artificial Intelligence is the Future of Deterrence
    submitted by /u/much_successes [link] [comments]
    LinkedIn Open-Sources ‘Feathr’, It’s Feature Store To Simplify Machine Learning (ML) Feature Management And Improve Developer Productivity
    LinkedIn research team has recently open-sourced feature store, Feathr, created to simplify machine learning (ML) feature management and increase developer productivity. Feathr is used by dozens of LinkedIn applications to define features, compute them for training, deploy them in production, and share them across consumers. Compared to previous application-specific feature pipeline solutions, Feathr users reported significantly reduced time required to add new features to model training and improved runtime performance. Hundreds of ML models run on LinkedIn in Search, Feed, and Ads applications. Thousands of features about entities in the Economic Graph, such as companies, job postings, and LinkedIn members, power the models. The most time-consuming aspects of handling the ML applications at scale have been preparing and managing features. Continue reading the summary Github: https://github.com/linkedin/feathr LinkedIn Blog: https://engineering.linkedin.com/blog/2022/open-sourcing-feathr—linkedin-s-feature-store-for-productive-m submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Artificial Nightmares: Crypt Walker || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    I created a DIY python package to ensemble multimodal models
    Multimodal: A python package to ensemble speech, text, etc. models and build new applications. Sample Applications: Speech Named Entity Anonymizer, Speech Question Answering, Speech Generation Code: kritiksoman/Multimodal: Listen. Write. Speak. Read. Think. (github.com) submitted by /u/kritiksoman [link] [comments]
  • Open

    Rigorous treatment of MDPs, Bellman, etc. in continuous spaces?
    I am looking for a book/monograph that goes through all the basics of reinforcement learning for continuous spaces with mathematical rigor. The classic RL book from Sutton/Barto and the new RL theory book from Agarwal/Jiang/Kakade/Sun both stick to finite MDPs except for special cases like linear MDPs and the LQR. I assume that a general statement of the fundamentals for continuous spaces will require grinding through a lot of details on existence, measurability, suprema vs. maxima, etc., that are not issues in the finite case. Is this why these authors avoid it? clarifying edit: I don't need to go all the way to continuous time - just state and action spaces. Maybe one of Bertsekas's books? submitted by /u/quadprog [link] [comments]  ( 1 min )
    NEED HELP MAKING A BASIC PYTHON MODEL
    I have a 2 column dataset “Date” “Result”. The Result column produces a 0 or 1 for each date. I need to make a reinforcement model that will predict whether or not the next result will be a 0 or 1. It needs to be done in jupyter notebook . submitted by /u/EffectiveBug4629 [link] [comments]  ( 1 min )
    From machine learning to sequential decision problems (reinforcement learning)
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process. An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book for a complete summary of the four classes for pure learning problems (aka bandit problems). See https://tinyurl.com/RLandSO/ Curious why Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Policy gradient vs. Policy iteration?
    Hello, I'm currently learning about MDPs and machine learning. I have a few questions that might be trivial or obvious but I can't find many concrete answers online: -Are policy gradient and policy iteration similar/the same? From what I can gather, policy iteration is a type or subset of policy gradient algorithm, is this correct? -Are all policy learning methods less effective for large state spaces? From my understanding you need to use some kind of value function iteration and heursitic function for larger state spaces because you can't encounter all states enough times to converge on an optimal policy -Does convergence on a policy/value function find a local or global optimum? With neural nets, simple backpropagation may only find a local minimum for the cost function, is this true of MDP/RL iteration algorithms? Thanks!! submitted by /u/egad_a_mouse [link] [comments]  ( 2 min )
    How to create a layer without inputs in tensorflow.
    In deep rl algorithm like PPO, a continuous stochastic policy is represented by Normal Distribution. For this the recommended way of creating a Normal Distribution is to get the mean by passing the state through NN and then using a state independent layer to predict log_std. This layer which predicts log_std should be trainable using backprop just like biases. So how to create this layer in tensorflow 2. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
  • Open

    Machine Learning-based Anomaly Detection in Optical Fiber Monitoring. (arXiv:2204.07059v1 [cs.NI])
    Secure and reliable data communication in optical networks is critical for high-speed Internet. However, optical fibers, serving as the data transmission medium providing connectivity to billons of users worldwide, are prone to a variety of anomalies resulting from hard failures (e.g., fiber cuts) and malicious physical attacks (e.g., optical eavesdropping (fiber tapping)) etc. Such anomalies may cause network disruption and thereby inducing huge financial and data losses, or compromise the confidentiality of optical networks by gaining unauthorized access to the carried data, or gradually degrade the network operations. Therefore, it is highly required to implement efficient anomaly detection, diagnosis, and localization schemes for enhancing the availability and reliability of optical networks. In this paper, we propose a data driven approach to accurately and quickly detect, diagnose, and localize fiber anomalies including fiber cuts, and optical eavesdropping attacks. The proposed method combines an autoencoder-based anomaly detection and an attention-based bidirectional gated recurrent unit algorithm, whereby the former is used for fault detection and the latter is adopted for fault diagnosis and localization once an anomaly is detected by the autoencoder. We verify the efficiency of our proposed approach by experiments under various anomaly scenarios using real operational data. The experimental results demonstrate that: (i) the autoencoder detects any fiber fault or anomaly with an F1 score of 96.86%; and (ii) the attention-based bidirectional gated recurrent unit algorithm identifies the the detected anomalies with an average accuracy of 98.2%, and localizes the faults with an average root mean square error of 0.19 m.  ( 2 min )
    Robust No-Regret Learning in Min-Max Stackelberg Games. (arXiv:2203.14126v2 [cs.GT] UPDATED)
    The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. In this paper, we investigate the behavior of no-regret learning in min-max games with dependent strategy sets, where the strategy of the first player constrains the behavior of the second. Such games are best understood as sequential, i.e., min-max Stackelberg, games. We consider two settings, one in which only the first player chooses their actions using a no-regret algorithm while the second player best responds, and one in which both players use no-regret algorithms. For the former case, we show that no-regret dynamics converge to a Stackelberg equilibrium. For the latter case, we introduce a new type of regret, which we call Lagrangian regret, and show that if both players minimize their Lagrangian regrets, then play converges to a Stackelberg equilibrium. We then observe that online mirror descent (OMD) dynamics in these two settings correspond respectively to a known nested (i.e., sequential) gradient descent-ascent (GDA) algorithm and a new simultaneous GDA-like algorithm, thereby establishing convergence of these algorithms to Stackelberg equilibrium. Finally, we analyze the robustness of OMD dynamics to perturbations by investigating online min-max Stackelberg games. We prove that OMD dynamics are robust for a large class of online min-max games with independent strategy sets. In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy sets.  ( 2 min )
    Open-Set Recognition: a Good Closed-Set Classifier is All You Need?. (arXiv:2110.06207v2 [cs.CV] CROSS LISTED)
    The ability to identify whether or not a test sample belongs to one of the semantic classes in a classifier's training set is critical to practical deployment of the model. This task is termed open-set recognition (OSR) and has received significant attention in recent years. In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We find that this relationship holds across loss objectives and architectures, and further demonstrate the trend both on the standard OSR benchmarks as well as on a large-scale ImageNet evaluation. Second, we use this correlation to boost the performance of a maximum logit score OSR 'baseline' by improving its closed-set accuracy, and with this strong baseline achieve state-of-the-art on a number of OSR benchmarks. Similarly, we boost the performance of the existing state-of-the-art method by improving its closed-set accuracy, but the resulting discrepancy with the strong baseline is marginal. Our third contribution is to present the 'Semantic Shift Benchmark' (SSB), which better respects the task of detecting semantic novelty, in contrast to other forms of distribution shift also considered in related sub-fields, such as out-of-distribution detection. On this new evaluation, we again demonstrate that there is negligible difference between the strong baseline and the existing state-of-the-art. Project Page: https://www.robots.ox.ac.uk/~vgg/research/osr/  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. (arXiv:2204.06806v1 [cs.CV])
    We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox  ( 2 min )
    MARF: Multiscale Adaptive-switch Random Forest for Leg Detection with 2D Laser Scanners. (arXiv:2204.06833v1 [cs.RO])
    For the 2D laser-based tasks, e.g., people detection and people tracking, leg detection is usually the first step. Thus, it carries great weight in determining the performance of people detection and people tracking. However, many leg detectors ignore the inevitable noise and the multiscale characteristics of the laser scan, which makes them sensitive to the unreliable features of point cloud and further degrades the performance of the leg detector. In this paper, we propose a multiscale adaptive-switch Random Forest (MARF) to overcome these two challenges. Firstly, the adaptive-switch decision tree is designed to use noisesensitive features to conduct weighted classification and noiseinvariant features to conduct binary classification, which makes our detector perform more robust to noise. Secondly, considering the multiscale property that the sparsity of the 2D point cloud is proportional to the length of laser beams, we design a multiscale random forest structure to detect legs at different distances. Moreover, the proposed approach allows us to discover a sparser human leg from point clouds than others. Consequently, our method shows an improved performance compared to other state-of-the-art leg detectors on the challenging Moving Legs dataset and retains the whole pipeline at a speed of 60+ FPS on lowcomputational laptops. Moreover, we further apply the proposed MARF to the people detection and tracking system, achieving a considerable gain in all metrics.  ( 2 min )
    Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations. (arXiv:2202.07800v2 [cs.CV] UPDATED)
    Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head self-attention (MHSA) among them. Complete leverage of these image tokens brings redundant computations since not all the tokens are attentive in MHSA. Examples include that tokens containing semantically meaningless or distractive image backgrounds do not positively contribute to the ViT predictions. In this work, we propose to reorganize image tokens during the feed-forward process of ViT models, which is integrated into ViT during training. For each forward inference, we identify the attentive image tokens between MHSA and FFN (i.e., feed-forward network) modules, which is guided by the corresponding class token attention. Then, we reorganize image tokens by preserving attentive image tokens and fusing inattentive ones to expedite subsequent MHSA and FFN computations. To this end, our method EViT improves ViTs from two perspectives. First, under the same amount of input image tokens, our method reduces MHSA and FFN computation for efficient inference. For instance, the inference speed of DeiT-S is increased by 50% while its recognition accuracy is decreased by only 0.3% for ImageNet classification. Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images. An example is that we improve the recognition accuracy of DeiT-S by 1% for ImageNet classification at the same computational cost of a vanilla DeiT-S. Meanwhile, our method does not introduce more parameters to ViTs. Experiments on the standard benchmarks show the effectiveness of our method. The code is available at https://github.com/youweiliang/evit  ( 2 min )
    Learning Task-Aware Energy Disaggregation: a Federated Approach. (arXiv:2204.06767v1 [cs.LG])
    We consider the problem of learning the energy disaggregation signals for residential load data. Such task is referred as non-intrusive load monitoring (NILM), and in order to find individual devices' power consumption profiles based on aggregated meter measurements, a machine learning model is usually trained based on large amount of training data coming from a number of residential homes. Yet collecting such residential load datasets require both huge efforts and customers' approval on sharing metering data, while load data coming from different regions or electricity users may exhibit heterogeneous usage patterns. Both practical concerns make training a single, centralized NILM model challenging. In this paper, we propose a decentralized and task-adaptive learning scheme for NILM tasks, where nested meta learning and federated learning steps are designed for learning task-specific models collectively. Simulation results on benchmark dataset validate proposed algorithm's performance on efficiently inferring appliance-level consumption for a variety of homes and appliances.  ( 2 min )
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    GM-TOuNN: Graded Multiscale Topology Optimization using Neural Networks. (arXiv:2204.06682v1 [cs.CE])
    Multiscale topology optimization (M-TO) entails generating an optimal global topology, and an optimal set of microstructures at a smaller scale, for a physics-constrained problem. With the advent of additive manufacturing, M-TO has gained significant prominence. However, generating optimal microstructures at various locations can be computationally very expensive. As an alternate, graded multiscale topology optimization (GM-TO) has been proposed where one or more pre-selected and graded (parameterized) microstructural topologies are used to fill the domain optimally. This leads to a significant reduction in computation while retaining many of the benefits of M-TO. A successful GM-TO framework must: (1) be capable of efficiently handling numerous pre-selected microstructures, (2) be able to continuously switch between these microstructures during optimization, (3) ensure that the partition of unity is satisfied, and (4) discourage microstructure mixing at termination. In this paper, we propose to meet these requirements by exploiting the unique classification capacity of neural networks. Specifically, we propose a graded multiscale topology optimization using neural-network (GM-TOuNN) framework with the following features: (1) the number of design variables is only weakly dependent on the number of pre-selected microstructures, (2) it guarantees partition of unity while discouraging microstructure mixing, and (3) it supports automatic differentiation, thereby eliminating manual sensitivity analysis. The proposed framework is illustrated through several examples.  ( 2 min )
    Medical Application of Geometric Deep Learning for the Diagnosis of Glaucoma. (arXiv:2204.07004v1 [eess.IV])
    Purpose: (1) To assess the performance of geometric deep learning (PointNet) in diagnosing glaucoma from a single optical coherence tomography (OCT) 3D scan of the optic nerve head (ONH); (2) To compare its performance to that obtained with a standard 3D convolutional neural network (CNN), and with a gold-standard glaucoma parameter, i.e. retinal nerve fiber layer (RNFL) thickness. Methods: 3D raster scans of the ONH were acquired with Spectralis OCT for 477 glaucoma and 2,296 non-glaucoma subjects at the Singapore National Eye Centre. All volumes were automatically segmented using deep learning to identify 7 major neural and connective tissues including the RNFL, the prelamina, and the lamina cribrosa (LC). Each ONH was then represented as a 3D point cloud with 1,000 points chosen randomly from all tissue boundaries. To simplify the problem, all ONH point clouds were aligned with respect to the plane and center of Bruch's membrane opening. Geometric deep learning (PointNet) was then used to provide a glaucoma diagnosis from a single OCT point cloud. The performance of our approach was compared to that obtained with a 3D CNN, and with RNFL thickness. Results: PointNet was able to provide a robust glaucoma diagnosis solely from the ONH represented as a 3D point cloud (AUC=95%). The performance of PointNet was superior to that obtained with a standard 3D CNN (AUC=87%) and with that obtained from RNFL thickness alone (AUC=80%). Discussion: We provide a proof-of-principle for the application of geometric deep learning in the field of glaucoma. Our technique requires significantly less information as input to perform better than a 3D CNN, and with an AUC superior to that obtained from RNFL thickness alone. Geometric deep learning may have wide applicability in the field of Ophthalmology.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Word Embeddings Are Capable of Capturing Rhythmic Similarity of Words. (arXiv:2204.04833v2 [cs.CL] UPDATED)
    Word embedding systems such as Word2Vec and GloVe are well-known in deep learning approaches to NLP. This is largely due to their ability to capture semantic relationships between words. In this work we investigated their usefulness in capturing rhythmic similarity of words instead. The results show that vectors these embeddings assign to rhyming words are more similar to each other, compared to the other words. It is also revealed that GloVe performs relatively better than Word2Vec in this regard. We also proposed a first of its kind metric for quantifying rhythmic similarity of a pair of words.  ( 2 min )
    BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks. (arXiv:2204.07054v1 [q-bio.NC])
    Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their established performance in other fields, there has not yet been a systematic study of how to design effective GNNs for brain network analysis. To bridge this gap, we present BrainGB, a benchmark for brain network analysis with GNNs. BrainGB standardizes the process by 1) summarizing brain network construction pipelines for both functional and structural neuroimaging modalities and 2) modularizing the implementation of GNN designs. We conduct extensive experiments on datasets across cohorts and modalities and recommend a set of general recipes for effective GNN designs on brain networks. To support open and reproducible research on GNN-based brain network analysis, we also host the BrainGB website at https:// brainnet.us/ with models, tutorials, examples, as well as an out-of-box Python package. We hope that this work will provide useful empirical evidence and offer insights for future research in this novel and promising direction.  ( 2 min )
    Accelerated Policy Learning with Parallel Differentiable Simulation. (arXiv:2204.07137v1 [cs.LG])
    Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Latent Aspect Detection from Online Unsolicited Customer Reviews. (arXiv:2204.06964v1 [cs.CL])
    Within the context of review analytics, aspects are the features of products and services at which customers target their opinions and sentiments. Aspect detection helps product owners and service providers to identify shortcomings and prioritize customers' needs, and hence, maintain revenues and mitigate customer churn. Existing methods focus on detecting the surface form of an aspect by training supervised learning methods that fall short when aspects are latent in reviews. In this paper, we propose an unsupervised method to extract latent occurrences of aspects. Specifically, we assume that a customer undergoes a two-stage hypothetical generative process when writing a review: (1) deciding on an aspect amongst the set of aspects available for the product or service, and (2) writing the opinion words that are more interrelated to the chosen aspect from the set of all words available in a language. We employ latent Dirichlet allocation to learn the latent aspects distributions for generating the reviews. Experimental results on benchmark datasets show that our proposed method is able to improve the state of the art when the aspects are latent with no surface form in reviews.  ( 2 min )
    Neighborhood Attention Transformer. (arXiv:2204.07143v1 [cs.CV])
    We present Neighborhood Attention Transformer (NAT), an efficient, accurate and scalable hierarchical transformer that works well on both image classification and downstream vision tasks. It is built upon Neighborhood Attention (NA), a simple and flexible attention mechanism that localizes the receptive field for each query to its nearest neighboring pixels. NA is a localization of self-attention, and approaches it as the receptive field size increases. It is also equivalent in FLOPs and memory usage to Swin Transformer's shifted window attention given the same receptive field size, while being less constrained. Furthermore, NA includes local inductive biases, which eliminate the need for extra operations such as pixel shifts. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet with only 4.3 GFLOPs and 28M parameters, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20k. We will open-source our checkpoints, training script, configurations, and our CUDA kernel at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .  ( 2 min )
    Fix Bugs with Transformer through a Neural-Symbolic Edit Grammar. (arXiv:2204.06643v1 [cs.LG])
    We introduce NSEdit (neural-symbolic edit), a novel Transformer-based code repair method. Given only the source code that contains bugs, NSEdit predicts an editing sequence that can fix the bugs. The edit grammar is formulated as a regular language, and the Transformer uses it as a neural-symbolic scripting interface to generate editing programs. We modify the Transformer and add a pointer network to select the edit locations. An ensemble of rerankers are trained to re-rank the editing sequences generated by beam search. We fine-tune the rerankers on the validation set to reduce over-fitting. NSEdit is evaluated on various code repair datasets and achieved a new state-of-the-art accuracy ($24.04\%$) on the Tufano small dataset of the CodeXGLUE benchmark. NSEdit performs robustly when programs vary from packages to packages and when buggy programs are concrete. We conduct detailed analysis on our methods and demonstrate the effectiveness of each component.  ( 2 min )
    Reflective Fiber Faults Detection and Characterization Using Long-Short-Term Memory. (arXiv:2204.07058v1 [cs.NI])
    To reduce operation-and-maintenance expenses (OPEX) and to ensure optical network survivability, optical network operators need to detect and diagnose faults in a timely manner and with high accuracy. With the rapid advancement of telemetry technology and data analysis techniques, data-driven approaches leveraging telemetry data to tackle the fault diagnosis problem have been gaining popularity due to their quick implementation and deployment. In this paper, we propose a novel multi-task learning model based on long short-term memory (LSTM) to detect, locate, and estimate the reflectance of fiber reflective faults (events) including the connectors and the mechanical splices by extracting insights from monitored data obtained by the optical time domain reflectometry (OTDR) principle commonly used for troubleshooting of fiber optic cables or links. The experimental results prove that the proposed method: (i) achieves a good detection capability and high localization accuracy within short measurement time even for low SNR values; and (ii) outperforms conventionally employed techniques.
    The Power of Linear Recurrent Neural Networks. (arXiv:1802.03308v6 [cs.LG] UPDATED)
    Recurrent neural networks are a powerful means to cope with time series. We show how linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, the size of an LRNN can be reduced significantly in one step, after inspecting the eigenvalues of the network transition matrix, by taking only the most relevant components. Therefore, in contrast to others, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.  ( 2 min )
    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.  ( 2 min )
    Interpretability of Machine Learning Methods Applied to Neuroimaging. (arXiv:2204.07005v1 [cs.CV])
    Deep learning methods have become very popular for the processing of natural images, and were then successfully adapted to the neuroimaging field. As these methods are non-transparent, interpretability methods are needed to validate them and ensure their reliability. Indeed, it has been shown that deep learning models may obtain high performance even when using irrelevant features, by exploiting biases in the training set. Such undesirable situations can potentially be detected by using interpretability methods. Recently, many methods have been proposed to interpret neural networks. However, this domain is not mature yet. Machine learning users face two major issues when aiming to interpret their models: which method to choose, and how to assess its reliability? Here, we aim at providing answers to these questions by presenting the most common interpretability methods and metrics developed to assess their reliability, as well as their applications and benchmarks in the neuroimaging context. Note that this is not an exhaustive survey: we aimed to focus on the studies which we found to be the most representative and relevant.  ( 2 min )
    The MIT Supercloud Workload Classification Challenge. (arXiv:2204.05839v2 [cs.DC] UPDATED)
    High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster. This dataset includes CPU and GPU usage by jobs, memory usage, and file system logs. In this paper, we present a workload classification challenge based on this dataset. We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads that can achieve higher accuracy than existing methods. Data and code will be made publicly available via the Datacenter Challenge website : https://dcc.mit.edu.  ( 2 min )
    Learning and controlling the source-filter representation of speech with a variational autoencoder. (arXiv:2204.07075v1 [cs.SD])
    Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we show that the source-filter model of speech production naturally arises in the latent space of a variational autoencoder (VAE) trained in an unsupervised manner on a dataset of natural speech signals. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we experimentally illustrate that $f_0$ and the formant frequencies are encoded in orthogonal subspaces of the VAE latent space and we develop a weakly-supervised method to accurately and independently control these speech factors of variation within the learned latent subspaces. Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms that is conditioned on $f_0$ and the formant frequencies, and which is applied to the transformation of speech signals.  ( 2 min )
    Solving AC Power Flow with Graph Neural Networks under Realistic Constraints. (arXiv:2204.07000v1 [cs.LG])
    In this paper we propose a graph neural network architecture solving the AC power flow problem under realistic constraints. While the energy transition is changing the energy industry to a digitalized and decentralized energy system, the challenges are increasingly shifting to the distribution grid level to integrate new loads and generation technologies. To ensure a save and resilient operation of distribution grids, AC power flow calculations are the means of choice to determine grid operating limits or analyze grid asset utilization in planning procedures. In our approach we demonstrate the development of a framework which makes use of graph neural networks to learn the physical constraints of the power flow. We present our model architecture on which we perform unsupervised training to learn a general solution of the AC power flow formulation that is independent of the specific topologies and supply tasks used for training. Finally, we demonstrate, validate and discuss our results on medium voltage benchmark grids.  ( 2 min )
    Generative power of a protein language model trained on multiple sequence alignments. (arXiv:2204.07110v1 [q-bio.BM])
    Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly uses the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences generally score better than those generated by Potts models, and even than natural sequences, for homology, coevolution and structure-based measures. Moreover, MSA Transformer better reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models, although Potts models better reproduce first- and second-order statistics. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.
    Matrix Completion with Heterogonous Cost. (arXiv:2203.12120v2 [cs.LG] UPDATED)
    The matrix completion problem has been studied broadly under many underlying conditions. The problem has been explored under adaptive or non-adaptive, exact or estimation, single-phase or multi-phase, and many other categories. In most of these cases, the observation cost of each entry is uniform and has the same cost across the columns. However, in many real-life scenarios, we could expect elements from distinct columns or distinct positions to have a different cost. In this paper, we explore this generalization under adaptive conditions. We approach the problem under two different cost models. The first one is that entries from different columns have different observation costs, but, within the same column, each entry has a uniform cost. The second one is any two entry has different observation cost, despite being the same or different columns. We provide complexity analysis of our algorithms and provide tightness guarantees.
    Semi-Supervised Convolutive NMF for Automatic Piano Transcription. (arXiv:2202.04989v2 [cs.SD] UPDATED)
    Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
    HCFL: A High Compression Approach for Communication-Efficient Federated Learning in Very Large Scale IoT Networks. (arXiv:2204.06760v1 [cs.LG])
    Federated learning (FL) is a new artificial intelligence concept that enables Internet-of-Things (IoT) devices to learn a collaborative model without sending the raw data to centralized nodes for processing. Despite numerous advantages, low computing resources at IoT devices and high communication costs for exchanging model parameters make applications of FL in massive IoT networks very limited. In this work, we develop a novel compression scheme for FL, called high-compression federated learning (HCFL), for very large scale IoT networks. HCFL can reduce the data load for FL processes without changing their structure and hyperparameters. In this way, we not only can significantly reduce communication costs, but also make intensive learning processes more adaptable on low-computing resource IoT devices. Furthermore, we investigate a relationship between the number of IoT devices and the convergence level of the FL model and thereby better assess the quality of the FL process. We demonstrate our HCFL scheme in both simulations and mathematical analyses. Our proposed theoretical research can be used as a minimum level of satisfaction, proving that the FL process can achieve good performance when a determined configuration is met. Therefore, we show that HCFL is applicable in any FL-integrated networks with numerous IoT devices.
    Reinforcement Learning Policy Recommendation for Interbank Network Stability. (arXiv:2204.07134v1 [econ.GN])
    In this paper we analyze the effect of a policy recommendation on the performances of an artificial interbank market. Financial institutions stipulate lending agreements following a public recommendation and their individual information. The former, modeled by a reinforcement learning optimal policy trying to maximize the long term fitness of the system, gathers information on the economic environment and directs economic actors to create credit relationships based on the optimal choice between a low interest rate or high liquidity supply. The latter, based on the agents' balance sheet, allows to determine the liquidity supply and interest rate that the banks optimally offer on the market. Based on the combination between the public and the private signal, financial institutions create or cut their credit connections over time via a preferential attachment evolving procedure able to generate a dynamic network. Our results show that the emergence of a core-periphery interbank network, combined with a certain level of homogeneity on the size of lenders and borrowers, are essential features to ensure the resilience of the system. Moreover, the reinforcement learning optimal policy recommendation plays a crucial role in mitigating systemic risk with respect to alternative policy instruments.
    Constrained Deep One-Class Feature Learning For Classifying Imbalanced Medical Images. (arXiv:2111.10610v2 [eess.IV] UPDATED)
    Medical image data are usually imbalanced across different classes. One-class classification has attracted increasing attention to address the data imbalance problem by distinguishing the samples of the minority class from the majority class. Previous methods generally aim to either learn a new feature space to map training samples together or to fit training samples by autoencoder-like models. These methods mainly focus on capturing either compact or descriptive features, where the information of the samples of a given one class is not sufficiently utilized. In this paper, we propose a novel deep learning-based method to learn compact features by adding constraints on the bottleneck features, and to preserve descriptive features by training an autoencoder at the same time. Through jointly optimizing the constraining loss and the autoencoder's reconstruction loss, our method can learn more relevant features associated with the given class, making the majority and minority samples more distinguishable. Experimental results on three clinical datasets (including the MRI breast images, FFDM breast images and chest X-ray images) obtains state-of-art performance compared to previous methods.
    Supplementation of deep neural networks with simplified physics-based features to increase model prediction accuracy. (arXiv:2204.06764v1 [cs.ET])
    To improve predictive models for STEM applications, supplemental physics-based features computed from input parameters are introduced into single and multiple layers of a deep neural network (DNN). While many studies focus on informing DNNs with physics through differential equations or numerical simulation, much may be gained through integration of simplified relationships. To evaluate this hypothesis, a number of thin rectangular plates simply-supported on all edges are simulated for five materials. With plate dimensions and material properties as input features and fundamental natural frequency as the sole output, predictive performance of a purely data-driven DNN-based model is compared with models using additional inputs computed from simplified physical relationships among baseline parameters, namely plate weight, modulus of rigidity, and shear modulus. To better understand the benefit to model accuracy, these additional features are injected into various single and multiple DNN layers, and trained with four different dataset sizes. When these physics-enhanced models are evaluated against independent data of the same materials and similar dimensions to the training sets, supplementation with simplified physics-based parameters provides little reduction in prediction error over the baseline for models trained with dataset sizes of 60 and greater, although small improvement from 19.3% to 16.1% occurs when trained with a sparse size of 30. Conversely, notable accuracy gains occur when the independent test data is of material and dimensions not conforming to the training set. Specifically, when physics-enhanced data is injected into multiple DNN layers, reductions in error from 33.2% to 19.6%, 34.9% to 19.9%, 35.8% to 22.4%, and 43.0% to 28.4% are achieved for training dataset sizes of 261, 117, 60, and 30, respectively, demonstrating attainment of a degree of generalizability.
    ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes. (arXiv:2201.07788v2 [cs.CV] UPDATED)
    Progress in 3D object understanding has relied on manually canonicalized shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, eg., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutation- and rotation-equivariant, and translation-invariant 3D networks. During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose. During training, this network uses self-supervision losses to learn the canonical pose from an un-canonicalized collection of full and partial 3D point clouds. ConDor can also learn to consistently co-segment object parts without any supervision. Extensive quantitative results on four new metrics show that our approach outperforms existing methods while enabling new applications such as operation on depth images and annotation transfer.
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.
    Time Series of Non-Additive Metrics: Identification and Interpretation of Contributing Factors of Variance by Linear Decomposition. (arXiv:2204.06688v1 [cs.LG])
    The research paper addresses linear decomposition of time series of non-additive metrics that allows for the identification and interpretation of contributing factors (input features) of variance. Non-additive metrics, such as ratios, are widely used in a variety of domains. It commonly requires preceding aggregations of underlying variables that are used to calculate the metric of interest. The latest poses a dimensionality challenge when the input features and underlying variables are formed as two-dimensional arrays along elements, such as account or customer identifications, and time points. It rules out direct modeling of the time series of a non-additive metric as a function of input features. The article discusses a five-step approach: (1) segmentations of input features and the underlying variables of the metric that are supported by unsupervised autoencoders, (2) univariate or joint fittings of the metric by the aggregated input features on the segmented domains, (3) transformations of pre-screened input features according to the fitted models, (4) aggregation of the transformed features as time series, and (5) modelling of the metric time series as a sum of constrained linear effects of the aggregated features. Alternatively, approximation by numerical differentiation has been considered to linearize the metric. It allows for element level univariate or joint modeling of step (2). The process of these analytical steps allows for a backward-looking explanatory decomposition of the metric as a sum of time series of the survived input features. The paper includes a synthetic example that studies loss-to-balance monthly rates of a hypothetical retail credit portfolio. To validate that no latent factors other than the survived input features have significant impacts on the metric, Statistical Process Control has been introduced for the residual time series.
    End-to-end multi-particle reconstruction in high occupancy imaging calorimeters with graph neural networks. (arXiv:2204.01681v2 [physics.ins-det] UPDATED)
    We present an end-to-end reconstruction algorithm to build particle candidates from detector hits in next-generation granular calorimeters similar to that foreseen for the high-luminosity upgrade of the CMS detector. The algorithm exploits a distance-weighted graph neural network, trained with object condensation, a graph segmentation technique. Through a single-shot approach, the reconstruction task is paired with energy regression. We describe the reconstruction performance in terms of efficiency as well as in terms of energy resolution. In addition, we show the jet reconstruction performance of our method and discuss its inference computational cost. To our knowledge, this work is the first-ever example of single-shot calorimetric reconstruction of ${\cal O}(1000)$ particles in high-luminosity conditions with 200 pileup.
    Machine Learning State-of-the-Art with Uncertainties. (arXiv:2204.05173v2 [cs.LG] UPDATED)
    With the availability of data, hardware, software ecosystem and relevant skill sets, the machine learning community is undergoing a rapid development with new architectures and approaches appearing at high frequency every year. In this article, we conduct an exemplary image classification study in order to demonstrate how confidence intervals around accuracy measurements can greatly enhance the communication of research results as well as impact the reviewing process. In addition, we explore the hallmarks and limitations of this approximation. We discuss the relevance of this approach reflecting on a spotlight publication of ICLR22. A reproducible workflow is made available as an open-source adjoint to this publication. Based on our discussion, we make suggestions for improving the authoring and reviewing process of machine learning articles.
    Characterizing the Fundamental Trade-offs in Learning Invariant Representations. (arXiv:2109.03386v2 [cs.LG] UPDATED)
    Many applications of representation learning, such as privacy-preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being independent or invariant with respect to a known semantic attribute. Solutions to such problems lead to trade-offs between the two objectives when they are competing with each other. While existing works study bounds on these trade-offs, three questions still remain outstanding: \emph{What are the exact fundamental trade-offs between utility and invariance?}, 2) \emph{What is the optimal dimensionality of the representation?}, and 3) \emph{What are the encoders (mapping data to a representation) that achieve the exact fundamental trade-offs and how can we estimate them from data?} This paper addresses these questions. We adopt a functional analysis perspective and derive closed-form solutions for the global optima of the underlying optimization problems under mild assumptions, which in turn yields closed formulae for the exact trade-offs, optimal representation dimensionality, and the corresponding encoders. We also numerically quantify the trade-offs on representative problems and compare them to those achieved by baseline invariant representation learning algorithms.
    StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2. (arXiv:2112.14683v2 [cs.CV] UPDATED)
    Videos show continuous events, yet most $-$ if not all $-$ video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be $-$ time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demonstrate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames' features. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024$^2$ videos for the first time. We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spatial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model is tested on four modern 256$^2$ and one 1024$^2$-resolution video synthesis benchmarks. In terms of sheer metrics, it performs on average ${\approx}30\%$ better than the closest runner-up. Project website: https://universome.github.io.
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.
    Epileptic Seizure Risk Assessment by Multi-Channel Imaging of the EEG. (arXiv:2204.07034v1 [eess.SP])
    Refractory epileptic patients can suffer a seizure at any moment. Seizure prediction would substantially improve their lives. In this work, based on scalp EEG and its transformation into images, the likelihood of an epileptic seizure occurring at any moment is computed using an average of the softmax layer output (the likelihood) of a CNN, instead of the output of the classification layer. Results show that by analyzing the likelihood and thresholding it, prediction has higher sensitivity or a lower FPR/h. The best threshold for the likelihood was higher than 50% for 5 patients, and was lower for the remaining 36. However, more testing is needed, especially in new seizures, to better assess the real performance of this method. This work is a proof of concept with a positive outlook.
    Learning Spectral Unions of Partial Deformable 3D Shapes. (arXiv:2104.00514v2 [cs.GR] UPDATED)
    Spectral geometric methods have brought revolutionary changes to the field of geometry processing. Of particular interest is the study of the Laplacian spectrum as a compact, isometry and permutation-invariant representation of a shape. Some recent works show how the intrinsic geometry of a full shape can be recovered from its spectrum, but there are approaches that consider the more challenging problem of recovering the geometry from the spectral information of partial shapes. In this paper, we propose a possible way to fill this gap. We introduce a learning-based method to estimate the Laplacian spectrum of the union of partial non-rigid 3D shapes, without actually computing the 3D geometry of the union or any correspondence between those partial shapes. We do so by operating purely in the spectral domain and by defining the union operation between short sequences of eigenvalues. We show that the approximated union spectrum can be used as-is to reconstruct the complete geometry [MRC*19], perform region localization on a template [RTO*19] and retrieve shapes from a database, generalizing ShapeDNA [RWP06] to work with partialities. Working with eigenvalues allows us to deal with unknown correspondence, different sampling, and different discretizations (point clouds and meshes alike), making this operation especially robust and general. Our approach is data-driven and can generalize to isometric and non-isometric deformations of the surface, as long as these stay within the same semantic class (e.g., human bodies or horses), as well as to partiality artifacts not seen at training time.
    A Neural Network based Framework for Effective Laparoscopic Video Quality Assessment. (arXiv:2202.04517v2 [eess.IV] UPDATED)
    Video quality assessment is a challenging problem having a critical significance in the context of medical imaging. For instance, in laparoscopic surgery, the acquired video data suffers from different kinds of distortion that not only hinder surgery performance but also affect the execution of subsequent tasks in surgical navigation and robotic surgeries. For this reason, we propose in this paper neural network-based approaches for distortion classification as well as quality prediction. More precisely, a Residual Network (ResNet) based approach is firstly developed for simultaneous ranking and classification task. Then, this architecture is extended to make it appropriate for the quality prediction task by using an additional Fully Connected Neural Network (FCNN). To train the overall architecture (ResNet and FCNN models), transfer learning and end-to-end learning approaches are investigated. Experimental results, carried out on a new laparoscopic video quality database, have shown the efficiency of the proposed methods compared to recent conventional and deep learning based approaches.
    SemiMultiPose: A Semi-supervised Multi-animal Pose Estimation Framework. (arXiv:2204.07072v1 [cs.CV])
    Multi-animal pose estimation is essential for studying animals' social behaviors in neuroscience and neuroethology. Advanced approaches have been proposed to support multi-animal estimation and achieve state-of-the-art performance. However, these models rarely exploit unlabeled data during training even though real world applications have exponentially more unlabeled frames than labeled frames. Manually adding dense annotations for a large number of images or videos is costly and labor-intensive, especially for multiple instances. Given these deficiencies, we propose a novel semi-supervised architecture for multi-animal pose estimation, leveraging the abundant structures pervasive in unlabeled frames in behavior videos to enhance training, which is critical for sparsely-labeled problems. The resulting algorithm will provide superior multi-animal pose estimation results on three animal experiments compared to the state-of-the-art baseline and exhibits more predictive power in sparsely-labeled data regimes.
    Transformers and the representation of biomedical background knowledge. (arXiv:2202.02432v2 [cs.CL] UPDATED)
    BioBERT and BioMegatron are Transformers models adapted for the biomedical domain based on publicly available biomedical corpora. As such, they have the potential to encode large-scale biological knowledge. We investigate the encoding and representation of biological knowledge in these models, and its potential utility to support inference in cancer precision medicine - namely, the interpretation of the clinical significance of genomic alterations. We compare the performance of different transformer baselines; we use probing to determine the consistency of encodings for distinct entities; and we use clustering methods to compare and contrast the internal properties of the embeddings for genes, variants, drugs and diseases. We show that these models do indeed encode biological knowledge, although some of this is lost in fine-tuning for specific tasks. Finally, we analyse how the models behave with regard to biases and imbalances in the dataset.
    Q-TART: Quickly Training for Adversarial Robustness and in-Transferability. (arXiv:2204.07024v1 [cs.CV])
    Raw deep neural network (DNN) performance is not enough; in real-world settings, computational load, training efficiency and adversarial security are just as or even more important. We propose to simultaneously tackle Performance, Efficiency, and Robustness, using our proposed algorithm Q-TART, Quickly Train for Adversarial Robustness and in-Transferability. Q-TART follows the intuition that samples highly susceptible to noise strongly affect the decision boundaries learned by DNNs, which in turn degrades their performance and adversarial susceptibility. By identifying and removing such samples, we demonstrate improved performance and adversarial robustness while using only a subset of the training data. Through our experiments we highlight Q-TART's high performance across multiple Dataset-DNN combinations, including ImageNet, and provide insights into the complementary behavior of Q-TART alongside existing adversarial training approaches to increase robustness by over 1.3% while using up to 17.9% less training time.
    HASA: Hybrid Architecture Search with Aggregation Strategy for Echinococcosis Classification and Ovary Segmentation in Ultrasound Images. (arXiv:2204.06697v1 [cs.CV])
    Different from handcrafted features, deep neural networks can automatically learn task-specific features from data. Due to this data-driven nature, they have achieved remarkable success in various areas. However, manual design and selection of suitable network architectures are time-consuming and require substantial effort of human experts. To address this problem, researchers have proposed neural architecture search (NAS) algorithms which can automatically generate network architectures but suffer from heavy computational cost and instability if searching from scratch. In this paper, we propose a hybrid NAS framework for ultrasound (US) image classification and segmentation. The hybrid framework consists of a pre-trained backbone and several searched cells (i.e., network building blocks), which takes advantage of the strengths of both NAS and the expert knowledge from existing convolutional neural networks. Specifically, two effective and lightweight operations, a mixed depth-wise convolution operator and a squeeze-and-excitation block, are introduced into the candidate operations to enhance the variety and capacity of the searched cells. These two operations not only decrease model parameters but also boost network performance. Moreover, we propose a re-aggregation strategy for the searched cells, aiming to further improve the performance for different vision tasks. We tested our method on two large US image datasets, including a 9-class echinococcosis dataset containing 9566 images for classification and an ovary dataset containing 3204 images for segmentation. Ablation experiments and comparison with other handcrafted or automatically searched architectures demonstrate that our method can generate more powerful and lightweight models for the above US image classification and segmentation tasks.
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.
    Geometric Deep Learning to Identify the Critical 3D Structural Features of the Optic Nerve Head for Glaucoma Diagnosis. (arXiv:2204.06931v1 [eess.IV])
    Purpose: The optic nerve head (ONH) undergoes complex and deep 3D morphological changes during the development and progression of glaucoma. Optical coherence tomography (OCT) is the current gold standard to visualize and quantify these changes, however the resulting 3D deep-tissue information has not yet been fully exploited for the diagnosis and prognosis of glaucoma. To this end, we aimed: (1) To compare the performance of two relatively recent geometric deep learning techniques in diagnosing glaucoma from a single OCT scan of the ONH; and (2) To identify the 3D structural features of the ONH that are critical for the diagnosis of glaucoma. Methods: In this study, we included a total of 2,247 non-glaucoma and 2,259 glaucoma scans from 1,725 subjects. All subjects had their ONHs imaged in 3D with Spectralis OCT. All OCT scans were automatically segmented using deep learning to identify major neural and connective tissues. Each ONH was then represented as a 3D point cloud. We used PointNet and dynamic graph convolutional neural network (DGCNN) to diagnose glaucoma from such 3D ONH point clouds and to identify the critical 3D structural features of the ONH for glaucoma diagnosis. Results: Both the DGCNN (AUC: 0.97$\pm$0.01) and PointNet (AUC: 0.95$\pm$0.02) were able to accurately detect glaucoma from 3D ONH point clouds. The critical points formed an hourglass pattern with most of them located in the inferior and superior quadrant of the ONH. Discussion: The diagnostic accuracy of both geometric deep learning approaches was excellent. Moreover, we were able to identify the critical 3D structural features of the ONH for glaucoma diagnosis that tremendously improved the transparency and interpretability of our method. Consequently, our approach may have strong potential to be used in clinical applications for the diagnosis and prognosis of a wide range of ophthalmic disorders.
    Integration of neural network and fuzzy logic decision making compared with bilayered neural network in the simulation of daily dew point temperature. (arXiv:2202.12256v2 [cs.LG] UPDATED)
    In this research, dew point temperature (DPT) is simulated using the data-driven approach. Adaptive Neuro-Fuzzy Inference System (ANFIS) is utilized as a data-driven technique to forecast this parameter at Tabriz in East Azerbaijan. Various input patterns, namely T min, T max, and T mean, are utilized for training the architecture whilst DPT is the model's output. The findings indicate that, in general, ANFIS method is capable of identifying data patterns with a high degree of accuracy. However, the approach demonstrates that processing time and computer resources may substantially increase by adding additional functions. Based on the results, the number of iterations and computing resources might change dramatically if new functionalities are included. As a result, tuning parameters have to be optimized inside the method framework. The findings demonstrate a high agreement between results by the data-driven technique (machine learning method) and the observed data. Using this prediction toolkit, DPT can be adequately forecasted solely based on the temperature distribution of Tabriz. This kind of modeling is extremely promising for predicting DPT at various sites. Besides, this study thoroughly compares the Bilayered Neural Network (BNN) and ANFIS models on various scales. Whilst the ANFIS model is extremely stable for almost all numbers of membership functions, the BNN model is highly sensitive to this scale factor to predict DPT.
    Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via Intent Conditioning. (arXiv:2204.06748v1 [cs.CL])
    Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is the difficulty of generating top-k (i.e., k-best) outputs with approaches such as beam search. To address this challenge, we propose a novel NAR semantic parser that introduces intent conditioning on the decoder. Inspired by the traditional intent and slot tagging parsers, we decouple the top-level intent prediction from the rest of a parse. As the top-level intent largely governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search and improves the quality and diversity of top-k outputs. We introduce a hybrid teacher-forcing approach to avoid training and inference mismatch. We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2. Like the existing NAR models, we maintain the O(1) decoding time complexity while generating more diverse outputs and improving the top-3 exact match (EM) by 2.4 points. In comparison with AR models, our model speeds up beam search inference by 6.7 times on CPU with competitive top-k EM.
    A Collection of Deep Learning-based Feature-Free Approaches for Characterizing Single-Objective Continuous Fitness Landscapes. (arXiv:2204.05752v2 [cs.LG] UPDATED)
    Exploratory Landscape Analysis is a powerful technique for numerically characterizing landscapes of single-objective continuous optimization problems. Landscape insights are crucial both for problem understanding as well as for assessing benchmark set diversity and composition. Despite the irrefutable usefulness of these features, they suffer from their own ailments and downsides. Hence, in this work we provide a collection of different approaches to characterize optimization landscapes. Similar to conventional landscape features, we require a small initial sample. However, instead of computing features based on that sample, we develop alternative representations of the original sample. These range from point clouds to 2D images and, therefore, are entirely feature-free. We demonstrate and validate our devised methods on the BBOB testbed and predict, with the help of Deep Learning, the high-level, expert-based landscape properties such as the degree of multimodality and the existence of funnel structures. The quality of our approaches is on par with methods relying on the traditional landscape features. Thereby, we provide an exciting new perspective on every research area which utilizes problem information such as problem understanding and algorithm design as well as automated algorithm configuration and selection.
    Proceedings of TDA: Applications of Topological Data Analysis to Data Science, Artificial Intelligence, and Machine Learning Workshop at SDM 2022. (arXiv:2204.01142v2 [math.AT] UPDATED)
    Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques. We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in developing and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces.
    ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision. (arXiv:2204.06863v1 [cs.LG])
    A way to overcome expensive and time-consuming manual data labeling is weak supervision - automatic annotation of data samples via a predefined set of labeling functions (LFs), rule-based mechanisms that generate potentially erroneous labels. In this work, we investigate noise reduction techniques for weak supervision based on the principle of k-fold cross-validation. In particular, we extend two frameworks for detecting the erroneous samples in manually annotated data to the weakly supervised setting. Our methods profit from leveraging the information about matching LFs and detect noisy samples more accurately. We also introduce a new algorithm for denoising the weakly annotated data called ULF, that refines the allocation of LFs to classes by estimating the reliable LFs-to-classes joint matrix. Evaluation on several datasets shows that ULF successfully improves weakly supervised learning without using any manually labeled data.
    A Melody-Unsupervision Model for Singing Voice Synthesis. (arXiv:2110.06546v2 [eess.AS] UPDATED)
    Recent studies in singing voice synthesis have achieved high-quality results leveraging advances in text-to-speech models based on deep neural networks. One of the main issues in training singing voice synthesis models is that they require melody and lyric labels to be temporally aligned with audio data. The temporal alignment is a time-exhausting manual work in preparing for the training data. To address the issue, we propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time but generates singing voice audio given a melody and lyrics input in inference time. The proposed model is composed of a phoneme classifier and a singing voice generator jointly trained in an end-to-end manner. The model can be fine-tuned by adjusting the amount of supervision with temporally aligned melody labels. Through experiments in melody-unsupervision and semi-supervision settings, we compare the audio quality of synthesized singing voice. We also show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.
    Unsupervised Temporal Learning on Monocular Videos for 3D Human Pose Estimation. (arXiv:2012.01511v3 [cs.CV] UPDATED)
    In this paper we propose an unsupervised learning method to extract temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying CSS only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features into the time-variant component, well-suited for human pose estimation. Our approach reduces error by about 50\% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    A Simple and Efficient Sampling-based Algorithm for General Reachability Analysis. (arXiv:2112.05745v3 [eess.SY] UPDATED)
    In this work, we analyze an efficient sampling-based algorithm for general-purpose reachability analysis, which remains a notoriously challenging problem with applications ranging from neural network verification to safety analysis of dynamical systems. By sampling inputs, evaluating their images in the true reachable set, and taking their $\epsilon$-padded convex hull as a set estimator, this algorithm applies to general problem settings and is simple to implement. Our main contribution is the derivation of asymptotic and finite-sample accuracy guarantees using random set theory. This analysis informs algorithmic design to obtain an $\epsilon$-close reachable set approximation with high probability, provides insights into which reachability problems are most challenging, and motivates safety-critical applications of the technique. On a neural network verification task, we show that this approach is more accurate and significantly faster than prior work. Informed by our analysis, we also design a robust model predictive controller that we demonstrate in hardware experiments.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?. (arXiv:2202.05821v2 [cs.LG] UPDATED)
    This paper presents the design and results of the "PEg TRAnsfert Workflow recognition" (PETRAW) challenge whose objective was to develop surgical workflow recognition methods based on one or several modalities, among video, kinematic, and segmentation data, in order to study their added value. The PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator. This data set was composed of videos, kinematics, semantic segmentation, and workflow annotations which described the sequences at three different granularity levels: phase, step, and activity. Five tasks were proposed to the participants: three of them were related to the recognition of all granularities with one of the available modalities, while the others addressed the recognition with a combination of modalities. Average application-dependent balanced accuracy (AD-Accuracy) was used as evaluation metric to take unbalanced classes into account and because it is more clinically relevant than a frame-by-frame score. Seven teams participated in at least one task and four of them in all tasks. Best results are obtained with the use of the video and the kinematics data with an AD-Accuracy between 93% and 90% for the four teams who participated in all tasks. The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams. However, the difference in testing execution time between the video/kinematic-based and the kinematic-based methods has to be taken into consideration. Is it relevant to spend 20 to 200 times more computing time for less than 3% of improvement? The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.
    Adversarial Parameter Defense by Multi-Step Risk Minimization. (arXiv:2109.02889v2 [cs.LG] UPDATED)
    Previous studies demonstrate DNNs' vulnerability to adversarial examples and adversarial training can establish a defense to adversarial examples. In addition, recent studies show that deep neural networks also exhibit vulnerability to parameter corruptions. The vulnerability of model parameters is of crucial value to the study of model robustness and generalization. In this work, we introduce the concept of parameter corruption and propose to leverage the loss change indicators for measuring the flatness of the loss basin and the parameter robustness of neural network parameters. On such basis, we analyze parameter corruptions and propose the multi-step adversarial corruption algorithm. To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions. Experimental results show that the proposed algorithm can improve both the parameter robustness and accuracy of neural networks.
    The Pseudo Projection Operator: Applications of Deep Learning to Projection Based Filtering in Non-Trivial Frequency Regimes. (arXiv:2111.07140v3 [eess.SP] UPDATED)
    Traditional frequency based projection filters, or projection operators (PO), separate signal and noise through a series of transformations which remove frequencies where noise is present. However, this technique relies on a priori knowledge of what frequencies contain signal and noise and that these frequencies do not overlap, which is difficult to achieve in practice. To address these issues, we introduce a PO-neural network hybrid model, the Pseudo Projection Operator (PPO), which leverages a neural network to perform frequency selection. We compare the filtering capabilities of a PPO, PO, and denoising autoencoder (DAE) on the University of Rochester Multi-Modal Music Performance Dataset with a variety of added noise types. In the majority of experiments, the PPO outperforms both the PO and DAE. Based upon these results, we suggest future application of the PPO to filtering problems in the physical and biological sciences.
    Planting Undetectable Backdoors in Machine Learning Models. (arXiv:2204.06974v1 [cs.LG])
    Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given black-box access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm or in Random ReLU networks. In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is "clean" or contains a backdoor. Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, our construction can produce a classifier that is indistinguishable from an "adversarially robust" classifier, but where every input has an adversarial example! In summary, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.
    Your fairness may vary: Pretrained language model fairness in toxic text classification. (arXiv:2108.01250v3 [cs.CL] UPDATED)
    The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that focusing on accuracy measures alone can lead to models with wide variation in fairness characteristics. Specifically, we observe that fairness can vary even more than accuracy with increasing training data size and different random initializations. At the same time, we find that little of the fairness variation is explained by model size, despite claims in the literature. To improve model fairness without retraining, we show that two post-processing methods developed for structured, tabular data can be successfully applied to a range of pretrained language models. Warning: This paper contains samples of offensive text.
    Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning. (arXiv:2106.06047v2 [cs.LG] UPDATED)
    Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models at https://github.com/Liangqiong/ViT-FL-main to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.
    LDPC codes: tracking non-stationary channel noise using sequential variational Bayesian estimates. (arXiv:2204.07037v1 [eess.SP])
    We present a sequential Bayesian learning method for tracking non-stationary signal-to-noise ratios in LDPC codes using probabilistic graphical models. We represent the LDPC code as a cluster graph using a general purpose cluster graph construction algorithm called the layered trees running intersection property (LTRIP) algorithm. The channel noise estimator is a global Gamma cluster, which we extend to allow for Bayesian tracking of non-stationary noise variation. We evaluate our proposed model on real-world 5G drive test data. Our results show that our model is capable of tracking non-stationary channel noise, which outperforms an LDPC code with a fixed knowledge of the actual average channel noise.
    Exploring Dual Encoder Architectures for Question Answering. (arXiv:2204.07120v1 [cs.CL])
    Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. There are two major types of dual encoders, Siamese Dual Encoders (SDE), with parameters shared across two encoders, and Asymmetric Dual Encoder (ADE), with two distinctly parameterized encoders. In this work, we explore the dual encoder architectures for QA retrieval tasks. By evaluating on MS MARCO and the MultiReQA benchmark, we show that SDE performs significantly better than ADE. We further propose three different improved versions of ADEs. Based on the evaluation of QA retrieval tasks and direct analysis of the embeddings, we demonstrate that sharing parameters in projection layers would enable ADEs to perform competitively with SDEs.
    Character-focused Video Thumbnail Retrieval. (arXiv:2204.06563v1 [cs.CV])
    We explore retrieving character-focused video frames as candidates for being video thumbnails. To evaluate each frame of the video based on the character(s) present in it, characters (faces) are evaluated in two aspects: Facial-expression: We train a CNN model to measure whether a face has an acceptable facial expression for being in a video thumbnail. This model is trained to distinguish faces extracted from artworks/thumbnails, from faces extracted from random frames of videos. Prominence and interactions: Character(s) in the thumbnail should be important character(s) in the video, to prevent the algorithm from suggesting non-representative frames as candidates. We use face clustering to identify the characters in the video, and form a graph in which the prominence (frequency of appearance) of the character(s), and their interactions (co-occurrence) are captured. We use this graph to infer the relevance of the characters present in each candidate frame. Once every face is scored based on the two criteria above, we infer frame level scores by combining the scores for all the faces within a frame.
    CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing. (arXiv:2204.06625v1 [cs.CL])
    Model ensemble is a popular approach to produce a low-variance and well-generalized model. However, it induces large memory and inference costs, which are often not affordable for real-world deployment. Existing work has resorted to sharing weights among models. However, when increasing the proportion of the shared weights, the resulting models tend to be similar, and the benefits of using model ensemble diminish. To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO. Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity. Meanwhile, we apply a prediction consistency regularizer across the perturbed models to control the variance due to the model diversity. Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model. Specifically, CAMERO outperforms the standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a significantly smaller model size (114.2M vs. 880.6M).
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.
    A Unified Analysis of Dynamic Interactive Learning. (arXiv:2204.07071v1 [cs.LG])
    In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. [2020] introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures both of the models analyzed by [Emamjomeh-Zadeh et al., 2020], which allows us to study any type of concept evolution and matches the same query complexity bounds and running time guarantees of the previous models. Using this general model we solve the open problem of closing the gap between the upper and lower bounds on query complexity. Finally, we study an efficient algorithm where the learner simply follows the feedback at each round, and we provide mistake bounds for low diameter graphs such as cliques, stars, and general o(log n) diameter graphs by using a Markov Chain model.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Sketching Algorithms and Lower Bounds for Ridge Regression. (arXiv:2204.06653v1 [cs.DS])
    We give a sketching-based iterative algorithm that computes $1+\varepsilon$ approximate solutions for the ridge regression problem $\min_x \|{Ax-b}\|_2^2 +\lambda\|{x}\|_2^2$ where $A \in \mathbb{R}^{n \times d}$ with $d \ge n$. Our algorithm, for a constant number of iterations (requiring a constant number of passes over the input), improves upon earlier work of Chowdhury et al., by requiring that the sketching matrix only has a weaker Approximate Matrix Multiplication (AMM) guarantee that depends on $\epsilon$, along with a constant subspace embedding guarantee. The earlier work instead requires that the sketching matrix have a subspace embedding guarantee that depends on $\epsilon$. For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al., with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \|{A}\|_2$ is the spectral norm of the matrix $A$. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression. Finally, we show that the sketch size required for our algorithm is essentially optimal for a natural framework of algorithms for ridge regression by proving lower bounds on oblivious sketching matrices for AMM. The sketch size lower bounds for AMM may be of independent interest.
    Activation Regression for Continuous Domain Generalization with Applications to Crop Classification. (arXiv:2204.07030v1 [cs.CV])
    Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire continental United States, providing macroscopic insight into the effects of geography on crop classification in multi-spectral and temporally distributed satellite imagery. Our method demonstrates improved generalisability from 1) passing geographically correlated climate variables along with the satellite data to a Transformer model and 2) regressing on the model features to reconstruct these domain variables. Combined, we provide a novel perspective on geographic generalisation in satellite imagery and a simple-yet-effective approach to leverage domain knowledge. Code is available at: \url{https://github.com/samar-khanna/cropmap}
    Any-resolution Training for High-resolution Image Synthesis. (arXiv:2204.07156v1 [cs.CV])
    Generative models operate at fixed resolution, even though natural images come in a variety of sizes. As high-resolution details are downsampled away, and low-resolution images are discarded altogether, precious supervision is lost. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. Taking advantage of this data is challenging; high-resolution processing is costly, and current architectures can only process fixed-resolution data. We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions. First, conditioning the generator on a target scale allows us to generate higher resolutions images than previously possible, without adding layers to the model. Second, by conditioning on continuous coordinates, we can sample patches that still obey a consistent global layout, which also allows for scalable training at higher resolutions. Controlled FFHQ experiments show our method takes advantage of the multi-resolution training data better than discrete multi-scale approaches, achieving better FID scores and cleaner high-frequency details. We also train on other natural image domains including churches, mountains, and birds, and demonstrate arbitrary scale synthesis with both coherent global layouts and realistic local details, going beyond 2K resolution in our experiments. Our project page is available at: https://chail.github.io/anyres-gan/.
    Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model. (arXiv:2108.07467v2 [cs.SD] UPDATED)
    Abdominal auscultation is a convenient, safe and inexpensive method to assess bowel conditions, which is essential in neonatal care. It helps early detection of neonatal bowel dysfunctions and allows timely intervention. This paper presents a neonatal bowel sound detection method to assist the auscultation. Specifically, a Convolutional Neural Network (CNN) is proposed to classify peristalsis and non-peristalsis sounds. The classification is then optimized using a Laplace Hidden Semi-Markov Model (HSMM). The proposed method is validated on abdominal sounds from 49 newborn infants admitted to our tertiary Neonatal Intensive Care Unit (NICU). The results show that the method can effectively detect bowel sounds with accuracy and area under curve (AUC) score being 89.81% and 83.96% respectively, outperforming 13 baseline methods. Furthermore, the proposed Laplace HSMM refinement strategy is proven capable to enhance other bowel sound detection models. The outcomes of this work have the potential to facilitate future telehealth applications for neonatal care. The source code of our work can be found at: https://bitbucket.org/chirudeakin/neonatal-bowel-sound-classification/
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.
    Masked Siamese Networks for Label-Efficient Learning. (arXiv:2204.07141v1 [cs.LG])
    We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.
    Activation Map Adaptation for Effective Knowledge Distillation. (arXiv:2010.13500v2 [cs.CV] UPDATED)
    Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.
    Shedding New Light on the Language of the Dark Web. (arXiv:2204.06885v1 [cs.CL])
    The hidden nature and the limited accessibility of the Dark Web, combined with the lack of public datasets in this domain, make it difficult to study its inherent characteristics such as linguistic properties. Previous works on text classification of Dark Web domain have suggested that the use of deep neural models may be ineffective, potentially due to the linguistic differences between the Dark and Surface Webs. However, not much work has been done to uncover the linguistic characteristics of the Dark Web. This paper introduces CoDA, a publicly available Dark Web dataset consisting of 10000 web documents tailored towards text-based Dark Web analysis. By leveraging CoDA, we conduct a thorough linguistic analysis of the Dark Web and examine the textual differences between the Dark Web and the Surface Web. We also assess the performance of various methods of Dark Web page classification. Finally, we compare CoDA with an existing public Dark Web dataset and evaluate their suitability for various use cases.
    Estimating Structural Disparities for Face Models. (arXiv:2204.06562v1 [cs.CV])
    In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations (groups) of datapoints. Thus, the inputs to disparity quantification consist of a model's predictions $\hat{y}$, the ground-truth labels for the predictions $y$, and group labels $g$ for the data points. Performance of the model for each group is calculated by comparing $\hat{y}$ and $y$ for the datapoints within a specific group, and as a result, disparity of performance across the different groups can be calculated. In many real world scenarios however, group labels ($g$) may not be available at scale during training and validation time, or collecting them might not be feasible or desirable as they could often be sensitive information. As a result, evaluating disparity metrics across categorical groups would not be feasible. On the other hand, in many scenarios noisy groupings may be obtainable using some form of a proxy, which would allow measuring disparity metrics across sub-populations. Here we explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation. Our experiments indicate that embeddings resulting from an off-the-shelf face recognition model, could meaningfully serve as a proxy for such estimation.
    ExPLoit: Extracting Private Labels in Split Learning. (arXiv:2112.01299v2 [cs.CR] UPDATED)
    Split learning is a popular technique used for vertical federated learning (VFL), where the goal is to jointly train a model on the private input and label data held by two parties. This technique uses a split-model, trained end-to-end, by exchanging the intermediate representations (IR) of the inputs and gradients of the IR between the two parties. We propose ExPLoit - a label-leakage attack that allows an adversarial input-owner to extract the private labels of the label-owner during split-learning. ExPLoit frames the attack as a supervised learning problem by using a novel loss function that combines gradient-matching and several regularization terms developed using key properties of the dataset and models. Our evaluations show that ExPLoit can uncover the private labels with near-perfect accuracy of up to 99.96%. Our findings underscore the need for better training techniques for VFL.
    A Study of Causal Confusion in Preference-Based Reward Learning. (arXiv:2204.06601v1 [cs.LG])
    Learning robot policies via preference-based reward learning is an increasingly popular method for customizing robot behavior. However, in recent years, there has been a growing body of anecdotal evidence that learning reward functions from preferences is prone to spurious correlations and reward gaming or hacking behaviors. While there is much anecdotal, empirical, and theoretical analysis of causal confusion and reward gaming behaviors both in reinforcement learning and imitation learning approaches that directly map from states to actions, we provide the first systematic study of causal confusion in the context of learning reward functions from preferences. To facilitate this study, we identify a set of three preference learning benchmark domains where we observe causal confusion when learning from offline datasets of pairwise trajectory preferences: a simple reacher domain, an assistive feeding domain, and an itch-scratching domain. To gain insight into this observed causal confusion, we present a sensitivity analysis that explores the effect of different factors--including the type of training data, reward model capacity, and feature dimensionality--on the robustness of rewards learned from preferences. We find evidence that learning rewards from pairwise trajectory preferences is highly sensitive and non-robust to spurious features and increasing model capacity, but not as sensitive to the type of training data. Videos, code, and supplemental results are available at https://sites.google.com/view/causal-reward-confusion.
    deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks. (arXiv:2204.06815v1 [cs.LG])
    A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containing different significance tests and utility functions specifically tailored towards research needs and usability.
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.
    OpenCSI: An Open-Source Dataset for Indoor Localization Using CSI-Based Fingerprinting. (arXiv:2104.07963v3 [eess.SP] UPDATED)
    Many applications require accurate indoor localization. Fingerprint-based localization methods propose a solution to this problem, but rely on a radio map that is effort-intensive to acquire. We automate the radio map acquisition phase using a software-defined radio (SDR) and a wheeled robot. Furthermore, we open-source a radio map acquired with our automated tool for a 3GPP Long-Term Evolution (LTE) wireless link. To the best of our knowledge, this is the first publicly available radio map containing channel state information (CSI). Finally, we describe first localization experiments on this radio map using a convolutional neural network to regress for location coordinates.
    Tight Bounds for Quantum State Certification with Incoherent Measurements. (arXiv:2204.07155v1 [quant-ph])
    We consider the problem of quantum state certification, where we are given the description of a mixed state $\sigma \in \mathbb{C}^{d \times d}$, $n$ copies of a mixed state $\rho \in \mathbb{C}^{d \times d}$, and $\varepsilon > 0$, and we are asked to determine whether $\rho = \sigma$ or whether $\| \rho - \sigma \|_1 > \varepsilon$. When $\sigma$ is the maximally mixed state $\frac{1}{d} I_d$, this is known as mixedness testing. We focus on algorithms which use incoherent measurements, i.e. which only measure one copy of $\rho$ at a time. Unlike those that use entangled, multi-copy measurements, these can be implemented without persistent quantum memory and thus represent a large class of protocols that can be run on current or near-term devices. For mixedness testing, there is a folklore algorithm which uses incoherent measurements and only needs $O(d^{3/2} / \varepsilon^2)$ copies. The algorithm is non-adaptive, that is, its measurements are fixed ahead of time, and is known to be optimal for non-adaptive algorithms. However, when the algorithm can make arbitrary incoherent measurements, the best known lower bound is only $\Omega (d^{4/3} / \varepsilon^2)$ [Bubeck-Chen-Li '20], and it has been an outstanding open problem to close this polynomial gap. In this work, 1) we settle the copy complexity of mixedness testing with incoherent measurements and show that $\Omega (d^{3/2} / \varepsilon^2)$ copies are necessary, and 2) we show the instance-optimal bounds for state certification to general $\sigma$ first derived by [Chen-Li-O'Donnell '21] for non-adaptive measurements also hold for arbitrary incoherent measurements. Qualitatively, our results say that adaptivity does not help at all for these problems. Our results are based on new techniques that allow us to reduce the problem to understanding certain matrix martingales, which we believe may be of independent interest.
    CLUES: A Benchmark for Learning Classifiers using Natural Language Explanations. (arXiv:2204.07142v1 [cs.CL])
    Supervised learning has traditionally focused on inductive learning by observing labeled examples of a task. In contrast, humans have the ability to learn new concepts from language. Here, we explore training zero-shot classifiers for structured data purely from language. For this, we introduce CLUES, a benchmark for Classifier Learning Using natural language ExplanationS, consisting of a range of classification tasks over structured data along with natural language supervision in the form of explanations. CLUES consists of 36 real-world and 144 synthetic classification tasks. It contains crowdsourced explanations describing real-world tasks from multiple teachers and programmatically generated explanations for the synthetic tasks. To model the influence of explanations in classifying an example, we develop ExEnt, an entailment-based model that learns classifiers using explanations. ExEnt generalizes up to 18% better (relative) on novel tasks than a baseline that does not use explanations. We delineate key challenges for automated learning from explanations, addressing which can lead to progress on CLUES in the future. Code and datasets are available at: https://clues-benchmark.github.io.
    Modeling the effects of environmental and perceptual uncertainty using deterministic reinforcement learning dynamics with partial observability. (arXiv:2109.07259v2 [nlin.AO] UPDATED)
    Assessing the systemic effects of uncertainty that arises from agents' partial observation of the true states of the world is critical for understanding a wide range of scenarios. Yet, previous modeling work on agent learning and decision-making either lacks a systematic way to describe this source of uncertainty or puts the focus on obtaining optimal policies using complex models of the world that would impose an unrealistically high cognitive demand on real agents. In this work we aim to efficiently describe the emergent behavior of biologically plausible and parsimonious learning agents faced with partially observable worlds. Therefore we derive and present deterministic reinforcement learning dynamics where the agents observe the true state of the environment only partially. We showcase the broad applicability of our dynamics across different classes of partially observable agent-environment systems. We find that partial observability creates unintuitive benefits in a number of specific contexts, pointing the way to further research on a general understanding of such effects. For instance, partially observant agents can learn better outcomes faster, in a more stable way and even overcome social dilemmas. Furthermore, our method allows the application of dynamical systems theory to partially observable multiagent leaning. In this regard we find the emergence of catastrophic limit cycles, a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions, all caused by partial observability. Therefore, the presented dynamics have the potential to become a formal, yet practical, lightweight and robust tool for researchers in biology, social science and machine learning to systematically investigate the effects of interacting partially observant agents.
    Multimodal spatiotemporal graph neural networks for improved prediction of 30-day all-cause hospital readmission. (arXiv:2204.06766v1 [cs.LG])
    Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such as (a) only patients with certain conditions are considered, (b) existing approaches do not leverage data temporality, (c) individual admissions are assumed independent of each other, which is unrealistic, (d) prior studies are usually limited to single source of data and single center data. To address these limitations, we propose a multimodal, modality-agnostic spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission that fuses multimodal in-patient longitudinal data. By training and evaluating our methods using longitudinal chest radiographs and electronic health records from two independent centers, we demonstrate that MM-STGNN achieves AUROC of 0.79 on both primary and external datasets. Furthermore, MM-STGNN significantly outperforms the current clinical reference standard, LACE+ score (AUROC=0.61), on the primary dataset. For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission (e.g., 3.7 point improvement in AUROC in patients with heart disease). Lastly, qualitative model interpretability analysis indicates that while patients' primary diagnoses were not explicitly used to train the model, node features crucial for model prediction directly reflect patients' primary diagnoses. Importantly, our MM-STGNN is agnostic to node feature modalities and could be utilized to integrate multimodal data for triaging patients in various downstream resource allocation tasks.
    EvoSTS Forecasting: Evolutionary Sparse Time-Series Forecasting. (arXiv:2204.07066v1 [cs.NE])
    In this work, we highlight our novel evolutionary sparse time-series forecasting algorithm also known as EvoSTS. The algorithm attempts to evolutionary prioritize weights of Long Short-Term Memory (LSTM) Network that best minimize the reconstruction loss of a predicted signal using a learned sparse coded dictionary. In each generation of our evolutionary algorithm, a set number of children with the same initial weights are spawned. Each child undergoes a training step and adjusts their weights on the same data. Due to stochastic back-propagation, the set of children has a variety of weights with different levels of performance. The weights that best minimize the reconstruction loss with a given signal dictionary are passed to the next generation. The predictions from the best-performing weights of the first and last generation are compared. We found improvements while comparing the weights of these two generations. However, due to several confounding parameters and hyperparameter limitations, some of the weights had negligible improvements. To the best of our knowledge, this is the first attempt to use sparse coding in this way to optimize time series forecasting model weights, such as those of an LSTM network.
    A deep learning algorithm for reducing false positives in screening mammography. (arXiv:2204.06671v1 [cs.CV])
    Screening mammography improves breast cancer outcomes by enabling early detection and treatment. However, false positive callbacks for additional imaging from screening exams cause unnecessary procedures, patient anxiety, and financial burden. This work demonstrates an AI algorithm that reduces false positives by identifying mammograms not suspicious for breast cancer. We trained the algorithm to determine the absence of cancer using 123,248 2D digital mammograms (6,161 cancers) and performed a retrospective study on 14,831 screening exams (1,026 cancers) from 15 US and 3 UK sites. Retrospective evaluation of the algorithm on the largest of the US sites (11,592 mammograms, 101 cancers) a) left the cancer detection rate unaffected (p=0.02, non-inferiority margin 0.25 cancers per 1000 exams), b) reduced callbacks for diagnostic exams by 31.1% compared to standard clinical readings, c) reduced benign needle biopsies by 7.4%, and d) reduced screening exams requiring radiologist interpretation by 41.6% in the simulated clinical workflow. This work lays the foundation for semi-autonomous breast cancer screening systems that could benefit patients and healthcare systems by reducing false positives, unnecessary procedures, patient anxiety, and expenses.
    SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study. (arXiv:2204.06699v1 [cs.LG])
    Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they focus only on understanding haploid sequences, which hinders their applicability towards understanding genetic variations, also known as single nucleotide polymorphisms (SNPs), which is crucial for genome-wide association study. In this paper, we introduce SNP2Vec, a scalable self-supervised pre-training approach for understanding SNP. We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort. Our approach significantly outperforms existing polygenic risk score methods and all other baselines, including the model that is trained entirely with haploid sequences. We release our code and dataset on https://github.com/HLTCHKUST/snp2vec.
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.
    Twitter User Representation Using Weakly Supervised Graph Embedding. (arXiv:2108.08988v3 [cs.CL] UPDATED)
    Social media platforms provide convenient means for users to participate in multiple online activities on various contents and create fast widespread interactions. However, this rapidly growing access has also increased the diverse information, and characterizing user types to understand people's lifestyle decisions shared in social media is challenging. In this paper, we propose a weakly supervised graph embedding based framework for understanding user types. We evaluate the user embedding learned using weak supervision over well-being related tweets from Twitter, focusing on 'Yoga', 'Keto diet'. Experiments on real-world datasets demonstrate that the proposed framework outperforms the baselines for detecting user types. Finally, we illustrate data analysis on different types of users (e.g., practitioner vs. promotional) from our dataset. While we focus on lifestyle-related tweets (i.e., yoga, keto), our method for constructing user representation readily generalizes to other domains.
    ICSML: Industrial Control Systems Machine Learning Inference Framework natively executing on IEC 61131-3 compliant devices. (arXiv:2202.10075v2 [cs.LG] UPDATED)
    Industrial Control Systems (ICS) have played a catalytic role in enabling the 4th Industrial Revolution. ICS devices like Programmable Logic Controllers (PLCs), automate, monitor, and control critical processes in industrial, energy, and commercial environments. The convergence of traditional Operational Technology (OT) with Information Technology (IT) has opened a new and unique threat landscape. This has inspired defense research that focuses heavily on Machine Learning (ML) based anomaly detection methods that run on external IT hardware, which means an increase in costs and the further expansion of the threat landscape. To remove this requirement, we introduce the ICS machine learning inference framework (ICSML) which enables the execution of ML model inference natively on the PLC. ICSML is implemented in IEC 61131-3 code and provides several optimizations to bypass the limitations imposed by the domain-specific languages. Therefore, it works \emph{on every PLC without the need for vendor support}. ICSML provides a complete set of components for the creation of full ML models similarly to established ML frameworks. We run a series of benchmarks studying memory and performance and compare our solution to the TFLite inference framework. At the same time, we develop domain-specific model optimizations to improve the efficiency of ICSML. To demonstrate the abilities of ICSML, we evaluate a case study of a real defense for process-aware attacks targeting a desalination plant.
    Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach. (arXiv:2204.06910v1 [cs.NI])
    In sliced networks, the shared tenancy of slices requires adaptive admission control of data flows, based on measurements of network resources. In this paper, we investigate the design of measurement-based admission control schemes, deciding whether a new data flow can be admitted and in this case, on which slice. The objective is to devise a joint measurement and decision strategy that returns a correct decision (e.g., the least loaded slice) with a certain level of confidence while minimizing the measurement cost (the number of measurements made before committing to the decision). We study the design of such strategies for several natural admission criteria specifying what a correct decision is. For each of these criteria, using tools from best arm identification in bandits, we first derive an explicit information-theoretical lower bound on the cost of any algorithm returning the correct decision with fixed confidence. We then devise a joint measurement and decision strategy achieving this theoretical limit. We compare empirically the measurement costs of these strategies, and compare them both to the lower bounds as well as a naive measurement scheme. We find that our algorithm significantly outperforms the naive scheme (by a factor $2-8$).
    Surface Similarity Parameter: A New Machine Learning Loss Metric for Oscillatory Spatio-Temporal Data. (arXiv:2204.06843v1 [cs.LG])
    Supervised machine learning approaches require the formulation of a loss functional to be minimized in the training phase. Sequential data are ubiquitous across many fields of research, and are often treated with Euclidean distance-based loss functions that were designed for tabular data. For smooth oscillatory data, those conventional approaches lack the ability to penalize amplitude, frequency and phase prediction errors at the same time, and tend to be biased towards amplitude errors. We introduce the surface similarity parameter (SSP) as a novel loss function that is especially useful for training machine learning models on smooth oscillatory sequences. Our extensive experiments on chaotic spatio-temporal dynamical systems indicate that the SSP is beneficial for shaping gradients, thereby accelerating the training process, reducing the final prediction error, and implementing a stronger regularization effect compared to using classical loss functions. The results indicate the potential of the novel loss metric particularly for highly complex and chaotic data, such as data stemming from the nonlinear two-dimensional Kuramoto-Sivashinsky equation and the linear propagation of dispersive surface gravity waves in fluids.
    Exploring the Distributed Knowledge Congruence in Proxy-data-free Federated Distillation. (arXiv:2204.07028v1 [cs.LG])
    Federated learning (FL) is a distributed machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data. User-constrained communication bandwidth and the requirement for personalized models pose severe challenges to FL. Federated distillation (FD) is proposed to simultaneously address the two problems, which exchanges knowledge between the server and clients, supporting heterogeneous local models while significantly reducing communication overhead. However, most existing FD methods require a proxy dataset, which is often unavailable. Proxy-data-free FD approaches eliminate the need for additional public data beyond clients' private data, but suffer from remarkable discrepancy among local knowledge due to model heterogeneity, leading to ambiguous representation on the server and inevitable accuracy degradation. To tackle this issue, we propose a proxy-data-free FD algorithm based on distributed knowledge congruence (FedDKC). FedDKC leverages well-designed refinement strategies to narrow local knowledge differences into an acceptable upper bound to mitigate the negative effects of knowledge incongruence. Specifically, from perspectives of peak probability and Shannon entropy of local knowledge, we design kernel-based knowledge refinement (KKR) and searching-based knowledge refinement (SKR) respectively, and theoretically guarantee the refined-local knowledge can satisfy an approximately-similar distribution and be regarded as congruent. Extensive experiments conducted on three common datasets demonstrate that our proposed FedDKC method outperforms the state-of-the-art in 93.33% of comparisons, and achieves faster convergence without increasing communication overhead.
    Assessing the communication gap between AI models and healthcare professionals: explainability, utility and trust in AI-driven clinical decision-making. (arXiv:2204.05030v2 [cs.AI] UPDATED)
    This paper contributes with a pragmatic evaluation framework for explainable Machine Learning (ML) models for clinical decision support. The study revealed a more nuanced role for ML explanation models, when these are pragmatically embedded in the clinical context. Despite the general positive attitude of healthcare professionals (HCPs) towards explanations as a safety and trust mechanism, for a significant set of participants there were negative effects associated with confirmation bias, accentuating model over-reliance and increased effort to interact with the model. Also, contradicting one of its main intended functions, standard explanatory models showed limited ability to support a critical understanding of the limitations of the model. However, we found new significant positive effects which repositions the role of explanations within a clinical context: these include reduction of automation bias, addressing ambiguous clinical cases (cases where HCPs were not certain about their decision) and support of less experienced HCPs in the acquisition of new domain knowledge.
    The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark. (arXiv:2204.06972v1 [cs.CV])
    We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computer vision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie, a famous Italian company with hundreds of shops located in different areas within the country. In particular, we focus on a specific prediction problem, namely short-observation new product sale forecasting (SO-fore). SO-fore assumes that the season has started and a set of new products is on the shelves of the different stores. The goal is to forecast the sales for a particular horizon, given a short, available past (few weeks), since no earlier statistics are available. To be successful, SO-fore approaches should capture this short past and exploit other modalities or exogenous data. To these aims, Visuelle 2.0 is equipped with disaggregated data at the item-shop level and multi-modal information for each clothing item, allowing computer vision approaches to come into play. The main message that we deliver is that the use of image data with deep networks boosts performances obtained when using the time series in long-term forecasting scenarios, ameliorating the WAPE by 8.2% and the MAE by 7.7%. The dataset is available at: https://humaticslab.github.io/forecasting/visuelle.
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.
    HCR-Net: A deep learning based script independent handwritten character recognition network. (arXiv:2108.06663v2 [cs.CV] UPDATED)
    Despite being studied extensively for a few decades, handwritten character recognition (HCR) is considered a challenging learning problem in pattern recognition and there is very limited research on script independent models. This is mainly because of diversity of scripts, focus of the conventional research on handcrafted feature extraction techniques, and unavailability of public datasets and codes to reproduce the results. On the other hand, deep learning has witnessed huge success in different areas of pattern recognition, including HCR, and provides end-to-end learning but it has been studied for specific scripts only. In this paper, we have proposed a novel deep learning architecture which exploits transfer learning and image-augmentation for end-to-end learning for script independent handwritten character recognition, called HCR-Net. HCR-Net is based on a novel transfer learning approach for HCR, where some of lower layers of a pre-trained network are utilized. Due to transfer learning and image-augmentation, HCR-Net provides faster training, better performance and better generalizations, and can achieve up to 99\% results of its final accuracy in just first epoch. The experimental results on publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages prove the efficacy of HCR-Net and establishes several new benchmarks. For reproducibility of the results and for the advancements of the HCR research, complete code is publicly released at https://github.com/jmdvinodjmd/HCR-Net.
    A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming. (arXiv:2110.03894v2 [eess.AS] UPDATED)
    In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.
    Real-time Adversarial Perturbations against Deep Reinforcement Learning Policies: Attacks and Defenses. (arXiv:2106.08746v3 [cs.LG] UPDATED)
    Recent work has shown that deep reinforcement learning (DRL) policies are vulnerable to adversarial perturbations. Adversaries can mislead policies of DRL agents by perturbing the state of the environment observed by the agents. Existing attacks are feasible in principle but face challenges in practice, either by being too slow to fool DRL policies in real time or by modifying past observations stored in the agent's memory. We show that using the Universal Adversarial Perturbation (UAP) method to compute perturbations, independent of the individual inputs to which they are applied to, can fool DRL policies effectively and in real time. We describe three such attack variants. Via an extensive evaluation using three Atari 2600 games, we show that our attacks are effective, as they fully degrade the performance of three different DRL agents (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.01). It is faster compared to the response time (0.6ms on average) of different DRL policies, and considerably faster than prior attacks using adversarial perturbations (1.8ms on average). We also show that our attack technique is efficient, incurring an online computational cost of 0.027ms on average. Using two further tasks involving robotic movement, we confirm that our results generalize to more complex DRL tasks. Furthermore, we demonstrate that the effectiveness of known defenses diminishes against universal perturbations. We propose an effective technique that detects all known adversarial perturbations against DRL policies, including all the universal perturbations presented in this paper.
    Extracting Finite Automata from RNNs Using State Merging. (arXiv:2201.12451v3 [cs.LG] UPDATED)
    One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior. In this work, we propose a new method for extracting finite automata from RNNs inspired by the state merging paradigm from grammatical inference. We demonstrate the effectiveness of our method on the Tomita languages benchmark, where we find that it is able to extract faithful automata from RNNs trained on all languages in the benchmark. We find that extraction performance is aided by the number of data provided during the extraction process, as well as, curiously, whether the RNN model is trained for additional epochs after perfectly learning its target language. We use our method to analyze this phenomenon, finding that training beyond convergence is useful because it leads to compression of the internal state space of the RNN. This finding demonstrates how our method can be used for interpretability and analysis of trained RNN models.
    SkillNet: A Sparsely Activated Model for General-Purpose Natural Language Understanding. (arXiv:2203.03312v2 [cs.CL] UPDATED)
    Prevailing deep models are single-purpose and overspecialize at individual tasks. However, when being extended to new tasks, they typically forget previously learned skills and learn from scratch. We address this issue by introducing SkillNet, a general-purpose model that stitches together existing skills to learn new tasks more effectively. The key feature of our approach is that it is sparsely activated guided by predefined skills. Different from traditional dense models that always activate all the model parameters, SkillNet only activates parts of the model parameters whose skills are relevant to the target task. When learning for a new task, our approach precisely activates required skills and also provides an option to add new skills. We evaluate on natural language understandings tasks and have the following findings. First, with only one model checkpoint, SkillNet performs better than task-specific fine-tuning and two multi-task learning baselines (i.e., dense model and Mixture-of-Experts model) on six tasks. Second, sparsely activated pre-training further improves the overall performance. Third, SkillNet significantly outperforms baseline systems when being extended to new tasks.
    Fine-Grained Population Mobility Data-Based Community-Level COVID-19 Prediction Model. (arXiv:2202.06257v2 [cs.LG] UPDATED)
    Predicting the number of infections in the anti-epidemic process is extremely beneficial to the government in developing anti-epidemic strategies, especially in fine-grained geographic units. Previous works focus on low spatial resolution prediction, e.g., county-level, and preprocess data to the same geographic level, which loses some useful information. In this paper, we propose a fine-grained population mobility data-based model (FGC-COVID) utilizing data of two geographic levels for community-level COVID-19 prediction. We use the population mobility data between Census Block Groups (CBGs), which is a finer-grained geographic level than community, to build the graph and capture the dependencies between CBGs using graph neural networks (GNNs). To mine as finer-grained patterns as possible for prediction, a spatial weighted aggregation module is introduced to aggregate the embeddings of CBGs to community level based on their geographic affiliation and spatial autocorrelation. Extensive experiments on 300 days LA city COVID-19 data indicate our model outperforms existing forecasting models on community-level COVID-19 prediction.
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.
    DeePN$^2$: A deep learning-based non-Newtonian hydrodynamic model. (arXiv:2112.14798v3 [physics.comp-ph] UPDATED)
    A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure and heterogeneous interaction. DeePN$^2$, a deep learning-based non-Newtonian hydrodynamic model, has been proposed and has shown some success in systematically passing the micro-scale structural mechanics information to the macro-scale hydrodynamics for suspensions with simple polymer conformation and bond potential. The model retains a multi-scaled nature by mapping the polymer configurations into a set of symmetry-preserving macro-scale features. The extended constitutive laws for these macro-scale features can be directly learned from the kinetics of their micro-scale counterparts. In this paper, we develop DeePN$^2$ using more complex micro-structural models. We show that DeePN$^2$ can faithfully capture the broadly overlooked viscoelastic differences arising from the specific molecular structural mechanics without human intervention.
    Stream-based Active Learning with Verification Latency in Non-stationary Environments. (arXiv:2204.06822v1 [cs.LG])
    Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
    LEFM-Nets: Learnable Explicit Feature Map Deep Networks for Segmentation of Histopathological Images of Frozen Sections. (arXiv:2204.06955v1 [eess.IV])
    Accurate segmentation of medical images is essential for diagnosis and treatment of diseases. These problems are solved by highly complex models, such as deep networks (DN), requiring a large amount of labeled data for training. Thereby, many DNs possess task- or imaging modality specific architectures with a decision-making process that is often hard to explain and interpret. Here, we propose a framework that embeds existing DNs into a low-dimensional subspace induced by the learnable explicit feature map (LEFM) layer. Compared to the existing DN, the framework adds one hyperparameter and only modestly increase the number of learnable parameters. The method is aimed at, but not limited to, segmentation of low-dimensional medical images, such as color histopathological images of stained frozen sections. Since features in the LEFM layer are polynomial functions of the original features, proposed LEFM-Nets contribute to the interpretability of network decisions. In this work, we combined LEFM with the known networks: DeepLabv3+, UNet, UNet++ and MA-net. New LEFM-Nets are applied to the segmentation of adenocarcinoma of a colon in a liver from images of hematoxylin and eosin (H&E) stained frozen sections. LEFM-Nets are also tested on nuclei segmentation from images of H&E stained frozen sections of ten human organs. On the first problem, LEFM-Nets achieved statistically significant performance improvement in terms of micro balanced accuracy and $F_1$ score than original networks. LEFM-Nets achieved only better performance in comparison with the original networks on the second problem. The source code is available at https://github.com/dsitnik/lefm.
    A Level Set Theory for Neural Implicit Evolution under Explicit Flows. (arXiv:2204.07159v1 [cs.CV])
    Coordinate-based neural networks parameterizing implicit surfaces have emerged as efficient representations of geometry. They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. Several of these operations can be viewed as energy-minimization problems that induce an instantaneous flow field on the explicit surface. Our method uses the flow field to deform parametric implicit surfaces by extending the classical theory of level sets. We also derive a consolidated view for existing methods on differentiable surface extraction and rendering, by formalizing connections to the level-set theory. We show that these methods drift from the theory and that our approach exhibits improvements for applications like surface smoothing, mean-curvature flow, inverse rendering and user-defined editing on implicit geometry.
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals. (arXiv:2204.06644v1 [cs.LG])
    We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO), which incorporates some of the best modeling techniques developed recently to speed up, stabilize, and enhance pretrained language models without compromising model effectiveness. The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks. More importantly, METRO-LM are efficient in that they often outperform previous large models with significantly smaller model sizes and lower pretraining cost.
    Performance Assessment of different Machine Learning Algorithm for Life-Time Prediction of Solder Joints based on Synthetic Data. (arXiv:2204.06627v1 [cs.LG])
    This paper proposes a computationally efficient methodology to predict the damage progression in solder contacts of electronic components using temperature-time curves. For this purpose, two machine learning algorithms, a Multilayer Perceptron and a Long Short-Term Memory network, are trained and compared with respect to their prediction accuracy and the required amount of training data. The training is performed using synthetic, normally distributed data that is realistic for automotive applications. A finite element model of a simple bipolar chip resistor in surface mount technology configuration is used to numerically compute the synthetic data. As a result, both machine learning algorithms show a relevant accuracy for the prediction of accumulated creep strains. With a training data length of 350 hours (12.5% of the available training data), both models show a constantly good fitting performance of $R^2$ of 0.72 for the Multilayer Perceptron and $R^2$ of 0.87 for the Long Short-Term Memory network. The prediction errors of the accumulated creep strains are less than 10% with an amount of 350 hours training data and decreases to less than 5 % when using further data. Therefore, both approaches are promising for the lifetime prediction directly on the electronic device.
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.
    Question rewriting? Assessing its importance for conversational question answering. (arXiv:2201.09146v2 [cs.CL] UPDATED)
    In conversational question answering, systems must correctly interpret the interconnected interactions and generate knowledgeable answers, which may require the retrieval of relevant information from a background repository. Recent approaches to this problem leverage neural language models, although different alternatives can be considered in terms of modules for (a) representing user questions in context, (b) retrieving the relevant background information, and (c) generating the answer. This work presents a conversational question answering system designed specifically for the Search-Oriented Conversational AI (SCAI) shared task, and reports on a detailed analysis of its question rewriting module. In particular, we considered different variations of the question rewriting module to evaluate the influence on the subsequent components, and performed a careful analysis of the results obtained with the best system configuration. Our system achieved the best performance in the shared task and our analysis emphasizes the importance of the conversation context representation for the overall system performance.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    LSTM-Autoencoder based Anomaly Detection for Indoor Air Quality Time Series Data. (arXiv:2204.06701v1 [cs.LG])
    Anomaly detection for indoor air quality (IAQ) data has become an important area of research as the quality of air is closely related to human health and well-being. However, traditional statistics and shallow machine learning-based approaches in anomaly detection in the IAQ area could not detect anomalies involving the observation of correlations across several data points (i.e., often referred to as long-term dependences). We propose a hybrid deep learning model that combines LSTM with Autoencoder for anomaly detection tasks in IAQ to address this issue. In our approach, the LSTM network is comprised of multiple LSTM cells that work with each other to learn the long-term dependences of the data in a time-series sequence. Autoencoder identifies the optimal threshold based on the reconstruction loss rates evaluated on every data across all time-series sequences. Our experimental results, based on the Dunedin CO2 time-series dataset obtained through a real-world deployment of the schools in New Zealand, demonstrate a very high and robust accuracy rate (99.50%) that outperforms other similar models.
    BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing. (arXiv:2201.02693v2 [cs.LG] UPDATED)
    Although mission-critical applications require the use of deep neural networks (DNNs), their continuous execution at mobile devices results in a significant increase in energy consumption. While edge offloading can decrease energy consumption, erratic patterns in channel quality, network and edge server load can lead to severe disruption of the system's key operations. An alternative approach, called split computing, generates compressed representations within the model (called "bottlenecks"), to reduce bandwidth usage and energy consumption. Prior work has proposed approaches that introduce additional layers, to the detriment of energy consumption and latency. For this reason, we propose a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates. We apply BottleFit on cutting-edge DNN models in image classification, and show that BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset, while state of the art such as SPINN loses up to 6% in accuracy. We experimentally measure the power consumption and latency of an image classification application running on an NVIDIA Jetson Nano board (GPU-based) and a Raspberry PI board (GPU-less). We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also compare BottleFit with state-of-the-art autoencoders-based approaches, and show that (i) BottleFit reduces power consumption and execution time respectively by up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size of the head model executed on the mobile device is 83 times smaller. We publish the code repository for reproducibility of the results in this study.
    A Natural Language Processing Approach for Instruction Set Architecture Identification. (arXiv:2204.06624v1 [cs.CR])
    Binary analysis of software is a critical step in cyber forensics applications such as program vulnerability assessment and malware detection. This involves interpreting instructions executed by software and often necessitates converting the software's binary file data to assembly language. The conversion process requires information about the binary file's target instruction set architecture (ISA). However, ISA information might not be included in binary files due to compilation errors, partial downloads, or adversarial corruption of file metadata. Machine learning (ML) is a promising methodology that can be used to identify the target ISA using binary data in the object code section of binary files. In this paper we propose a binary code feature extraction model to improve the accuracy and scalability of ML-based ISA identification methods. Our feature extraction model can be used in the absence of domain knowledge about the ISAs. Specifically, we adapt models from natural language processing (NLP) to i) identify successive byte patterns commonly observed in binary codes, ii) estimate the significance of each byte pattern to a binary file, and iii) estimate the relevance of each byte pattern in distinguishing between ISAs. We introduce character-level features of encoded binaries to identify fine-grained bit patterns inherent to each ISA. We use a dataset with binaries from 12 different ISAs to evaluate our approach. Empirical evaluations show that using our byte-level features in ML-based ISA identification results in an 8% higher accuracy than the state-of-the-art features based on byte-histograms and byte pattern signatures. We observe that character-level features allow reducing the size of the feature set by up to 16x while maintaining accuracy above 97%.
    Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression. (arXiv:2204.06787v1 [cs.LG])
    Traditional one-bit compressed stochastic gradient descent can not be directly employed in multi-hop all-reduce, a widely adopted distributed training paradigm in network-intensive high-performance computing systems such as public clouds. According to our theoretical findings, due to the cascading compression, the training process has considerable deterioration on the convergence performance. To overcome this limitation, we implement a sign-bit compression-based learning synchronization framework, Marsit. It prevents cascading compression via an elaborate bit-wise operation for unbiased sign aggregation and its specific global compensation mechanism for mitigating compression deviation. The proposed framework retains the same theoretical convergence rate as non-compression mechanisms. Experimental results demonstrate that Marsit reduces up to 35% training time while preserving the same accuracy as training without compression.
    Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning. (arXiv:2204.06904v1 [quant-ph])
    Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.
    MIMO Channel Estimation using Score-Based Generative Models. (arXiv:2204.07122v1 [eess.SP])
    Channel estimation is a critical task in multiple-input multiple-output digital communications that has effects on end-to-end system performance. In this work, we introduce a novel approach for channel estimation using deep score-based generative models. These models are trained to estimate the gradient of the log-prior distribution, and can be used to iteratively refine estimates, given observed measurements of a signal. We introduce a framework for training score-based generative models for wireless channels, as well as performing channel estimation using posterior sampling at test time. We derive theoretical robustness guarantees of channel estimation with posterior sampling in single-input single-output scenarios, and show that the observations regarding estimation performance are verified experimentally in MIMO channels. Our results in simulated clustered delay line channels show competitive in-distribution performance without error floors in the high signal-to-noise ratio regime, and robust out-of-distribution performance, outperforming competing deep learning methods by up to 5 dB in end-to-end communication performance, while the complexity analysis reveals how model architecture can efficiently trade performance for estimation latency.
    data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. (arXiv:2202.03555v2 [cs.LG] UPDATED)
    While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.
    From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks. (arXiv:2204.07018v1 [cs.SD])
    This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18. Our main motivation for focusing on such a front-end classifier rather than other complex architectures is balancing recognition accuracy and the total number of training parameters. Herein, we measure the impact of different settings required for generating more informative Mel-frequency cepstral coefficient (MFCC), short-time Fourier transform (STFT), and discrete wavelet transform (DWT) representations on our front-end model. This measurement involves comparing the classification performance over the adversarial robustness. We demonstrate an inverse relationship between recognition accuracy and model robustness against six benchmarking attack algorithms on the balance of average budgets allocated by the adversary and the attack cost. Moreover, our experimental results have shown that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary than other 2D representations. We also report some results on different convolutional neural network architectures such as ResNet-34, ResNet-56, AlexNet, and GoogLeNet, SB-CNN, and LSTM-based.  ( 2 min )
    Learning Convolutional Neural Networks in Frequency Domain. (arXiv:2204.06718v1 [cs.CV])
    Convolutional neural network (CNN) achieves impressive success in the field of computer vision during the past few decades. As the core of CNNs, image convolution operation helps CNNs to achieve good performance on image-related tasks. However, image convolution is hard to be implemented and parallelized. In this paper, we propose a novel neural network model, namely CEMNet, that can be trained in frequency domain. The most important motivation of this research is that we can use the very simple element-wise multiplication operation to replace the image convolution in frequency domain based on Cross-Correlation Theorem. We further introduce Weight Fixation Mechanism to alleviate over-fitting, and analyze the working behavior of Batch Normalization, Leaky ReLU and Dropout in frequency domain to design their counterparts for CEMNet. Also, to deal with complex inputs brought by DFT, we design two branch network structure for CEMNet. Experimental results imply that CEMNet works well in frequency domain, and achieve good performance on MNIST and CIFAR-10 databases. To our knowledge, CEMNet is the first model trained in Fourier Domain that achieves more than 70\% validation accuracy on CIFAR-10 database.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Joint Coreset Construction and Quantization for Distributed Machine Learning. (arXiv:2204.06652v1 [cs.LG])
    Coresets are small, weighted summaries of larger datasets, aiming at providing provable error bounds for machine learning (ML) tasks while significantly reducing the communication and computation costs. To achieve a better trade-off between ML error bounds and costs, we propose the first framework to incorporate quantization techniques into the process of coreset construction. Specifically, we theoretically analyze the ML error bounds caused by a combination of coreset construction and quantization. Based on that, we formulate an optimization problem to minimize the ML error under a fixed budget of communication cost. To improve the scalability for large datasets, we identify two proxies of the original objective function, for which efficient algorithms are developed. For the case of data on multiple nodes, we further design a novel algorithm to allocate the communication budget to the nodes while minimizing the overall ML error. Through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness and efficiency of our proposed algorithms for a variety of ML tasks. In particular, our algorithms have achieved more than 90% data reduction with less than 10% degradation in ML performance in most cases.  ( 2 min )
    Control-oriented meta-learning. (arXiv:2204.06716v1 [cs.RO])
    Real-time adaptation is imperative to the control of robots operating in complex, dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance, provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However, it is often difficult to specify such features a priori, such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper, we turn to data-driven modeling with neural networks to learn, offline from past data, an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation, rather than regression-oriented meta-learning of features to fit input-output data. Specifically, we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With both fully-actuated and underactuated nonlinear planar rotorcraft subject to wind, we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control.  ( 2 min )
    Leveraging convergence behavior to balance conflicting tasks in multi-task learning. (arXiv:2204.06698v1 [cs.LG])
    Multi-Task Learning is a learning paradigm that uses correlated tasks to improve performance generalization. A common way to learn multiple tasks is through the hard parameter sharing approach, in which a single architecture is used to share the same subset of parameters, creating an inductive bias between them during the training process. Due to its simplicity, potential to improve generalization, and reduce computational cost, it has gained the attention of the scientific and industrial communities. However, tasks often conflict with each other, which makes it challenging to define how the gradients of multiple tasks should be combined to allow simultaneous learning. To address this problem, we use the idea of multi-objective optimization to propose a method that takes into account temporal behaviour of the gradients to create a dynamic bias that adjust the importance of each task during the backpropagation. The result of this method is to give more attention to the tasks that are diverging or that are not being benefited during the last iterations, allowing to ensure that the simultaneous learning is heading to the performance maximization of all tasks. As a result, we empirically show that the proposed method outperforms the state-of-art approaches on learning conflicting tasks. Unlike the adopted baselines, our method ensures that all tasks reach good generalization performances.  ( 2 min )
    Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. (arXiv:2204.06684v1 [physics.comp-ph])
    Deep neural operators can learn operators mapping between infinite-dimensional function spaces via deep neural networks and have become an emerging paradigm of scientific machine learning. However, training neural operators usually requires a large amount of high-fidelity data, which is often difficult to obtain in real engineering problems. Here, we address this challenge by using multifidelity learning, i.e., learning from multifidelity datasets. We develop a multifidelity neural operator based on a deep operator network (DeepONet). A multifidelity DeepONet includes two standard DeepONets coupled by residual learning and input augmentation. Multifidelity DeepONet significantly reduces the required amount of high-fidelity data and achieves one order of magnitude smaller error when using the same amount of high-fidelity data. We apply a multifidelity DeepONet to learn the phonon Boltzmann transport equation (BTE), a framework to compute nanoscale heat transport. By combining a trained multifidelity DeepONet with genetic algorithm or topology optimization, we demonstrate a fast solver for the inverse design of BTE problems.  ( 2 min )
    Leveraging Natural Learning Processing to Uncover Themes in Clinical Notes of Patients Admitted for Heart Failure. (arXiv:2204.07074v1 [cs.LG])
    Heart failure occurs when the heart is not able to pump blood and oxygen to support other organs in the body as it should. Treatments include medications and sometimes hospitalization. Patients with heart failure can have both cardiovascular as well as non-cardiovascular comorbidities. Clinical notes of patients with heart failure can be analyzed to gain insight into the topics discussed in these notes and the major comorbidities in these patients. In this regard, we apply machine learning techniques, such as topic modeling, to identify the major themes found in the clinical notes specific to the procedures performed on 1,200 patients admitted for heart failure at the University of Illinois Hospital and Health Sciences System (UI Health). Topic modeling revealed five hidden themes in these clinical notes, including one related to heart disease comorbidities.  ( 2 min )
    The Vision of Self-Evolving Computing Systems. (arXiv:2204.06825v1 [cs.SE])
    Computing systems are omnipresent; their sustainability has become crucial for our society. A key aspect of this sustainability is the ability of computing systems to cope with the continuous change they face, ranging from dynamic operating conditions, to changing goals, and technological progress. While we are able to engineer smart computing systems that autonomously deal with various types of changes, handling unanticipated changes requires system evolution, which remains in essence a human-centered process. This will eventually become unmanageable. To break through the status quo, we put forward an arguable opinion for the vision of self-evolving computing systems that are equipped with an evolutionary engine enabling them to evolve autonomously. Specifically, when a self-evolving computing system detects conditions outside its operational domain, such as an anomaly or a new goal, it activates an evolutionary engine that runs online experiments to determine how the system needs to evolve to deal with the changes, thereby evolving its architecture. During this process the engine can integrate new computing elements that are provided by computing warehouses. These computing elements provide specifications and procedures enabling their automatic integration. We motivate the need for self-evolving computing systems in light of the state of the art, outline a conceptual architecture of self-evolving computing systems, and illustrate the architecture for a future smart city mobility system that needs to evolve continuously with changing conditions. To conclude, we highlight key research challenges to realize the vision of self-evolving computing systems.  ( 2 min )
    Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking. (arXiv:2204.07049v1 [cs.RO])
    In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.  ( 2 min )
    Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data. (arXiv:2101.11442v2 [physics.med-ph] UPDATED)
    Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the mapping from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cram\'er-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies.  ( 2 min )
    Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity. (arXiv:2204.06618v1 [cs.CC])
    This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the class of languages recognizable by families of Boolean circuits of constant depth and polynomial size. This upper bound subsumes Hahn's (2020) results that GUHAT cannot recognize the DYCK languages or the PARITY language, since those languages are outside AC$^0$ (Furst et al., 1984). In contrast, the non-AC$^0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.  ( 2 min )
    Network state Estimation using Raw Video Analysis: vQoS-GAN based non-intrusive Deep Learning Approach. (arXiv:2204.07062v1 [cs.MM])
    Content based providers transmits real time complex signal such as video data from one region to another. During this transmission process, the signals usually end up distorted or degraded where the actual information present in the video is lost. This normally happens in the streaming video services applications. Hence there is a need to know the level of degradation that happened in the receiver side. This video degradation can be estimated by network state parameters like data rate and packet loss values. Our proposed solution vQoS GAN (video Quality of Service Generative Adversarial Network) can estimate the network state parameters from the degraded received video data using a deep learning approach of semi supervised generative adversarial network algorithm. A robust and unique design of deep learning network model has been trained with the video data along with data rate and packet loss class labels and achieves over 95 percent of training accuracy. The proposed semi supervised generative adversarial network can additionally reconstruct the degraded video data to its original form for a better end user experience.  ( 2 min )
    Clifford Circuits can be Properly PAC Learned if and only if $\textsf{RP}=\textsf{NP}$. (arXiv:2204.06638v1 [quant-ph])
    Given a dataset of input states, measurements, and probabilities, is it possible to efficiently predict the measurement probabilities associated with a quantum circuit? Recent work of Caro and Datta (2020) studied the problem of PAC learning quantum circuits in an information theoretic sense, leaving open questions of computational efficiency. In particular, one candidate class of circuits for which an efficient learner might have been possible was that of Clifford circuits, since the corresponding set of states generated by such circuits, called stabilizer states, are known to be efficiently PAC learnable (Rocchetto 2018). Here we provide a negative result, showing that proper learning of CNOT circuits is hard for classical learners unless $\textsf{RP} = \textsf{NP}$. As the classical analogue and subset of Clifford circuits, this naturally leads to a hardness result for Clifford circuits as well. Additionally, we show that if $\textsf{RP} = \textsf{NP}$ then there would exist efficient proper learning algorithms for CNOT and Clifford circuits. By similar arguments, we also find that an efficient proper quantum learner for such circuits exists if and only if $\textsf{NP} \subseteq \textsf{RQP}$.  ( 2 min )
    EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. (arXiv:2204.06947v1 [cs.LG])
    In recent years, neural networks and especially deep architectures have received substantial attention for EEG signal analysis in the field of brain-computer interfaces (BCIs). In this ongoing research area, the end-to-end models are more favoured than traditional approaches requiring signal transformation pre-classification. They can eliminate the need for prior information from experts and the extraction of handcrafted features. However, although several deep learning algorithms have been already proposed in the literature, achieving high accuracies for classifying motor movements or mental tasks, they often face a lack of interpretability and therefore are not quite favoured by the neuroscience community. The reasons behind this issue can be the high number of parameters and the sensitivity of deep neural networks to capture tiny yet unrelated discriminative features. We propose an end-to-end deep learning architecture called EEG-ITNet and a more comprehensible method to visualise the network learned patterns. Using inception modules and causal convolutions with dilation, our model can extract rich spectral, spatial, and temporal information from multi-channel EEG signals with less complexity (in terms of the number of trainable parameters) than other existing end-to-end architectures, such as EEG-Inception and EEG-TCNet. By an exhaustive evaluation on dataset 2a from BCI competition IV and OpenBMI motor imagery dataset, EEG-ITNet shows up to 5.9\% improvement in the classification accuracy in different scenarios with statistical significance compared to its competitors. We also comprehensively explain and support the validity of network illustration from a neuroscientific perspective. We have also made our code open at https://github.com/AbbasSalami/EEG-ITNet  ( 2 min )
    Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis. (arXiv:2204.06929v1 [eess.IV])
    Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this problem to a great extent. In this paper, we propose a generative adversarial network (GAN) based image synthesis framework. Our main contributions include: 1) we present the first work that can synthesize realistic B-mode US images with high-resolution and customized texture editing features; 2) to enhance structural details of generated images, we propose to introduce auxiliary sketch guidance into a conditional GAN. We superpose the edge sketch onto the object mask and use the composite mask as the network input; 3) to generate high-resolution US images, we adopt a progressive training strategy to gradually generate high-resolution images from low-resolution images. In addition, a feature loss is proposed to minimize the difference of high-level features between the generated and real images, which further improves the quality of generated images; 4) the proposed US image synthesis method is quite universal and can also be generalized to the US images of other anatomical structures besides the three ones tested in our study (lung, hip joint, and ovary); 5) extensive experiments on three large US image datasets are conducted to validate our method. Ablation studies, customized texture editing, user studies, and segmentation tests demonstrate promising results of our method in synthesizing realistic US images.  ( 2 min )
    Modularity benefits reinforcement learning agents with competing homeostatic drives. (arXiv:2204.06608v1 [cs.LG])
    The problem of balancing conflicting needs is fundamental to intelligence. Standard reinforcement learning algorithms maximize a scalar reward, which requires combining different objective-specific rewards into a single number. Alternatively, different objectives could also be combined at the level of action value, such that specialist modules responsible for different objectives submit different action suggestions to a decision process, each based on rewards that are independent of one another. In this work, we explore the potential benefits of this alternative strategy. We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable. We find that the modular agent: a) requires minimal exogenously determined exploration; b) has improved sample efficiency; and c) is more robust to out-of-domain perturbation.  ( 2 min )
    Learning Invariances with Generalised Input-Convex Neural Networks. (arXiv:2204.07009v1 [cs.LG])
    Considering smooth mappings from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such mappings. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as building grid-based approximations, sampling points along the level curves, or finding trajectories on the manifold. However, global parameterisations can only exist if the level sets are connected. For this purpose, we introduce a novel and flexible class of neural networks that generalise input-convex networks. These networks represent functions that are guaranteed to have connected level sets forming smooth manifolds on the input space. We further show that global parameterisations of these level sets can be always found efficiently. Lastly, we demonstrate that our novel technique for characterising invariances is a powerful generative data exploration tool in real-world applications, such as computational chemistry.  ( 2 min )
    Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems. (arXiv:2204.07135v1 [cs.LG])
    Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings.
    SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots. (arXiv:2011.06252v2 [cs.CV] UPDATED)
    This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots. Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images. The SVAM-Net architecture is configured in a unique way to jointly accommodate bottom-up and top-down learning within two separate branches of the network while sharing the same encoding layers. We design dedicated spatial attention modules (SAMs) along these learning pathways to exploit the coarse-level and fine-level semantic features for SOD at four stages of abstractions. The bottom-up branch performs a rough yet reasonably accurate saliency estimation at a fast rate, whereas the deeper top-down branch incorporates a residual refinement module (RRM) that provides fine-grained localization of the salient objects. Extensive performance evaluation of SVAM-Net on benchmark datasets clearly demonstrates its effectiveness for underwater SOD. We also validate its generalization performance by several ocean trials' data that include test images of diverse underwater scenes and waterbodies, and also images with unseen natural objects. Moreover, we analyze its computational feasibility for robotic deployments and demonstrate its utility in several important use cases of visual attention modeling.
    Ensemble learning using individual neonatal data for seizure detection. (arXiv:2204.07043v1 [eess.SP])
    Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The approach allows different detector architectures amongst the institutions. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best overall performance, it was only marginally outperformed by the Dawid-Skene method when local detectors approach performance of a single detector trained on all available data.
    Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks. (arXiv:2008.09777v4 [cs.LG] UPDATED)
    The most significant barrier to the advancement of Neural Architecture Search (NAS) is its demand for large computational resources, which hinders scientifically sound empirical evaluations of NAS methods. Tabular NAS benchmarks have alleviated this problem substantially, making it possible to properly evaluate NAS methods in seconds on commodity machines. However, an unintended consequence of tabular NAS benchmarks has been a focus on extremely small architectural search spaces since their construction relies on exhaustive evaluations of the space. This leads to unrealistic results that do not transfer to larger spaces. To overcome this fundamental limitation, we propose a methodology to create cheap NAS surrogate benchmarks for arbitrary search spaces. We exemplify this approach by creating surrogate NAS benchmarks on the existing tabular NAS-Bench-101 and on two widely used NAS search spaces with up to $10^{21}$ architectures ($10^{13}$ times larger than any previous tabular NAS benchmark). We show that surrogate NAS benchmarks can model the true performance of architectures better than tabular benchmarks (at a small fraction of the cost), that they lead to faithful estimates of how well different NAS methods work on the original non-surrogate benchmark, and that they can generate new scientific insight. We open-source all our code and believe that surrogate NAS benchmarks are an indispensable tool to extend scientifically sound work on NAS to large and exciting search spaces.
  • Open

    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    Sparse Interaction Neighborhood Selection for Markov Random Fields via Reversible Jump and Pseudoposteriors. (arXiv:2204.05933v2 [stat.CO] UPDATED)
    We consider the problem of estimating the interacting neighborhood of a Markov Random Field model with finite support and homogeneous pairwise interactions based on relative positions of a two-dimensional lattice. Using a Bayesian framework, we propose a Reversible Jump Monte Carlo Markov Chain algorithm that jumps across subsets of a maximal range neighborhood, allowing us to perform model selection based on a marginal pseudoposterior distribution of models.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.  ( 2 min )
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.  ( 2 min )
    Classification of Hyperspectral Images Using SVM with Shape-adaptive Reconstruction and Smoothed Total Variation. (arXiv:2203.15619v3 [cs.CV] UPDATED)
    In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vector Machines (SVMs) are trained to estimate the pixel-wise probability maps of each class. Then the Smoothed Total Variation (STV) model is applied to denoise and generate the final classification map. Experiments show that SaR-SVM-STV outperforms the SVM-STV method with a few training labels, demonstrating the significance of reconstructing hyperspectral images before classification.  ( 2 min )
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.  ( 2 min )
    Observable adjustments in single-index models for regularized M-estimators. (arXiv:2204.06990v1 [math.ST])
    We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hat\beta$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hat\beta$ and the predicted values $X\hat\beta$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hat\beta,X\hat\beta)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hat\beta$ and $X\hat\beta$: Approximations of $(\hat\beta,X\hat\beta)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hat\beta$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.  ( 2 min )
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.  ( 2 min )
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.  ( 2 min )
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.  ( 2 min )
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.  ( 2 min )
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.  ( 2 min )
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.  ( 2 min )
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.  ( 2 min )
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.  ( 2 min )
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    Streamlined Variational Inference for Linear Mixed Models with Crossed Random Effects. (arXiv:1910.01799v3 [stat.ME] UPDATED)
    We derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product restriction. The least stringent product restriction delivers a high degree of inferential accuracy. However, this accuracy must be mitigated against its higher storage and computing demands. Faster sparse storage and computing alternatives are also provided, but come with the price of diminished inferential accuracy. This article provides full algorithmic details of three variational inference strategies, presents detailed empirical results on their pros and cons and, thus, guides the users on their choice of variational inference approach depending on the problem size and computing resources.  ( 2 min )
    Using Machine Learning for Particle Identification in ALICE. (arXiv:2204.06900v1 [nucl-ex])
    Particle identification (PID) is one of the main strengths of the ALICE experiment at the LHC. It is a crucial ingredient for detailed studies of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. ALICE provides PID information via various experimental techniques, allowing for the identification of particles over a broad momentum range (from around 100 MeV/$c$ to around 50 GeV/$c$). The main challenge is how to combine the information from various detectors effectively. Therefore, PID represents a model classification problem, which can be addressed using Machine Learning (ML) solutions. Moreover, the complexity of the detector and richness of the detection techniques make PID an interesting area of research also for the computer science community. In this work, we show the current status of the ML approach to PID in ALICE. We discuss the preliminary work with the Random Forest approach for the LHC Run 2 and a more advanced solution based on Domain Adaptation Neural Networks, including a proposal for its future implementation within the ALICE computing software for the upcoming LHC Run 3.  ( 2 min )

  • Open

    [D] What is the difference between channel-wise and self attention in this case?
    Example: I fed 32 feature maps of dimension 6x6x32 into a Squeeze and Excitation layer, which assigns a weight to each of my channel through a channel-wise attention mechanism. What is the difference between passing these 32 feature maps into a Hybrid Transformer Encoder with patch of dimension 6x6? (So 1 patch for each channel) As I understand it, channel attention says "which channel is important for the final prediction". While transformer (with self attention) tells us "where to focus our attention in a given context". Isn't that the same if the patches are the channels? Basically it tells us on which patch to focus, and if patch=channel then squeeze excitation = self attention ? submitted by /u/Rogitus [link] [comments]  ( 1 min )
    [P] Bounding.ai Launches New Marketplace for AI Labeled Data
    In a new announcement, Bounding.ai launched its marketplace for computer vision and AI teams to access training data easily. The platform is designed to empower individuals and small companies around the world to create and sell datasets that will be instantly accessible by any team in need of labeled data. Bounding.ai Launches New Marketplace for AI Labeled Data & $5,000 Prize submitted by /u/Freyr_AI [link] [comments]  ( 1 min )
    [D]Unsupervised classification of words/phrases?
    I have found most unsupervised text classification methods to be mostly suitable for classifying documents containing relatively large amounts of words/sentences. However, I have a dataset with entries containing only single words or phrases but not full sentences. The goal is to do unsupervised semantic classification on these words/phrases. Are there any existing algorithms for such a task? submitted by /u/Comprehensive-Egg707 [link] [comments]  ( 1 min )
    [N] Robot Arm Acts As "Hand And Eyes" of Language Model To Execute Real World Tasks With SayCan And Robotics At Google
    Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. Github: https://say-can.github.io/ Video of Robot Executing Commands: https://youtu.be/zOph99BjRqs?t=4 submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [D] AskScience AMA Series: We are seven leading scientists specializing in the intersection of machine learning and neuroscience. Ask Us Anything about computational neuroscience or science education!
    submitted by /u/blueneuronDOTnet [link] [comments]  ( 2 min )
    [D] How DALL-E 2 Actually Works
    Here's a video explaining the overall architecture of DALL-E 2 and how it actually works! Great overview for those who haven't had time to read the paper How does DALL-E 2 actually work? submitted by /u/SleekEagle [link] [comments]  ( 1 min )
    [N] Announcing the Learning on Graphs Conference!
    We think this new venue will be valuable for the Graph/Geometric Machine Learning community. Why? See our blogpost: https://michael-bronstein.medium.com/announcing-the-learning-on-graphs-conference-c63caed7347 The LoG Conference key facts: - Covers work broadly related to machine learning on graphs and geometry - Proceedings track published in PMLR - Also has a non-archival extended abstract track - Double blind review process on OpenReview - Top reviewers receive monetary rewards - First year: virtual December 9-12 2022, free to attend. Call for papers: https://logconference.github.io/cfp/ Stay updated via Twitter: https://twitter.com/LogConference Or LinkedIn: https://www.linkedin.com/company/log-conference Advisory board: Regina Barzilay (MIT), Xavier Bresson (NUS), Michael Bronstein (Oxford/Twitter), Stephan Günnemann (TUM), Stefanie Jegelka (MIT), Jure Leskovec (Stanford), Pietro Liò (Cambridge), Jian Tang (MILA/HEC Montreal), Jie Tang (Tsinghua), Petar Veličković (DeepMind), Soledad Villar (JHU), Marinka Zitnik (Harvard). Organizers: Yuanqi Du (DP Technology), Hannes Stärk (MIT), Derek Lim (MIT), Chaitanya Joshi (Cambridge), Andreea-Ioana Deac (Mila), Iulia Duta (Cambridge), Joshua Robinson (MIT). submitted by /u/Hannes-Stark [link] [comments]  ( 1 min )
  • Open

    New to machine learning, want to simulate robotics in a 3d environment
    My employer makes significant use of robotic weld cells, and while working with the equipment I've noticed what seems to be room for improvement in the programming. This is purely a personal academic project, as I am quite curious on if machine learning could produce comparable or superior results to the human-made programming used at work. However, as such there will unfortunately be areas of vagueness because I need to stick to knowledge that is publicly available regarding their operations. I'm going to have to stick to more generic, publicly available reference material, and will not be able to share most, if any, of the end result. I would like to run simulations in a 3D environment, using machine learning to train a computer program to find the most efficient sequence of movements &…  ( 3 min )
    Does using a centralized critic always mean that the agents receive global observation?
    submitted by /u/No_Possibility_7588 [link] [comments]
    What algorithm would be suited for a “Just do it as good as you can” situation?
    I’m really new to RL so please bear with me if I’m making mistakes here, but I’m trying to make an environment that emulates a network of roads. The algorithm will need to generate a quick route between n destinations when n equals some number with an insane amount of permutations, like 30 for example. This is like emulating the destinations required by a mailman’s route on a map, and trying to find the fastest way to get to each one. The algorithms sequence of decisions will be choosing a node to travel to, while each node represents an street intersection or point where the street ends. By the time it’s traveled to every destination using the nodes, it’ll review the network of nodes it used and sum the distance between each one to get total distance of route. The goal is to get the total distance as small as possible. Is this realistic for a RL problem, or do I need to try to engineer some way to determine if every decision was either good or bad? Could I build a mathematical way to approximate the quickest route and then reward the RL algorithm by generating a better route than the mathematically approximated one? I could try rewarding the algorithm at each decision by whether it reduced the total distance required to any target it has yet to visit. I could try to mathematically make this more viable… what do y’all think,should I do something like that? Am I headed in the right direction? Thanks for any and all help! submitted by /u/professorDissociate [link] [comments]  ( 3 min )
    Where is env.nS for Frozen Lake in OpenAI Gym
    I am trying to run this: env4 = FrozenLakeEnv(map_name='4x4', is_slippery=False) env4.nS ​ I then get this error: 'FrozenLakeEnv' object has no attribute 'nS' ​ But I see it in the source code on line 151 and 152: https://github.com/openai/gym/blob/master/gym/envs/toy_text/frozen_lake.py ​ Edit: I'm trying to follow along with some tutorials online. Thank you for the help! submitted by /u/postdoc403b [link] [comments]  ( 1 min )
    Getting max/min action in DDPG and TD3
    I am using DDPG for a custom environment. My reward is positive (the sum-rate in a communication system). My problem is that I get the max or min action after a few training steps and it saturates with a non-optimized solution. How can I address this problem? I tried redesigning my reward to include positive and negative values but it didn’t work. I read that some people are using reward scaling. What is it and how would I scale it? I mean is there a specific method? I couldn’t find enough resources on that. Any help is much appreciated! submitted by /u/alicefaisal [link] [comments]  ( 1 min )
    Question about pseudocodes
    Hi I'm redoing all the RL algorithms in python, to better understanding them. I'm mostly following Sutton and Barto but the pseudo code there is often hard to follow. Do you know any other place where I can look at? submitted by /u/New_neanderthal [link] [comments]  ( 1 min )
    Industry use of reinforcement learning
    I have been studying RL now for 18 months as a goal to get a job in it. Yet when I look at jobs, I see very seldom postings about it. I am wondering why is it the case ? From my current understanding I could think of dozens of applications with huge potential gains. It feel like an untapped potential. Or am I missing something ? What do you think is the big obstacle to wider adoption to RL ? Do you think it overlaps with classical control at the moment and is not justified ? submitted by /u/Ouassimf [link] [comments]  ( 4 min )
    Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 2 min )
  • Open

    Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker
    Automatic speech recognition (ASR) is a commonly used machine learning (ML) technology in our daily lives and business scenarios. Applications such as voice-controlled assistants like Alexa and Siri, and voice-to-text applications like automatic subtitling for videos and transcribing meetings, are all powered by this technology. These applications take audio clips as input and convert speech […]  ( 11 min )
    Build a virtual credit approval agent with Amazon Lex, Amazon Textract, and Amazon Connect
    Banking and financial institutions review thousands of credit applications per week. The credit approval process requires financial organizations to invest time and resources in reviewing documents like W2s, bank statements, and utility bills. The overall experience can be costly for the organization. At the same time, organizations have to consider borrowers, who are waiting for […]  ( 8 min )
  • Open

    AWS Cloud Migration: All You Need to Know
    Businesses today face myriad challenges, some of which are successfully addressed with help from cloud computing. This is where AWS cloud migration which promises to be a boon for businesses grappling with a sudden increase in traffic or for those who are looking for accelerated app deployment. It is also handy for cautious businesses that… Read More »AWS Cloud Migration: All You Need to Know The post AWS Cloud Migration: All You Need to Know appeared first on Data Science Central.  ( 3 min )
  • Open

    Web Crawling in Python
    In the old days, it was a tedious job to collect data, and sometimes very expensive. Machine learning projects cannot […] The post Web Crawling in Python appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    Kamikaze Drones in Russia’s War Against Ukraine Point to Future "Killer Robots"
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    AI News | Breakthrough AI Robot Arm Understanding From Google | OpenAI DALL-E 2 | AI Edge Computing In Space
    submitted by /u/getrich_or_diemining [link] [comments]  ( 1 min )
    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
    My first attempt at machine learning. I made a cool chatbot 😎
    I made a self learning conversational chatbot in ReactJS. It does nothing but reply to user messages and only understands text, for now 😄 https://xalen.netlify.app What do you think? Yea or Nay? submitted by /u/GameTide [link] [comments]  ( 1 min )
  • Open

    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
  • Open

    Startup Transforms Meeting Notes With Time-Saving Features
    Gil Makleff and Artem Koren are developing AI for meeting transcripts, creating time-savers like shareable highlights of the text that is often TL;DR (too long; didn’t read). The Sembly founders conceived the idea after years of working in enterprise operational consulting at UMT Consulting Group, which was acquired by Ernst & Young. “We had an Read article > The post Startup Transforms Meeting Notes With Time-Saving Features appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Data Scientists vs. BI Developer: What’s the Difference?
    Here’s the truth.  ( 1 min )
2022-05-15T01:10:15.055Z osmosfeed 1.14.4